Gelato Pro 2.0 and Gelato 2.0 Gpu-Accelerated Final-Frame Renderer

Total Page:16

File Type:pdf, Size:1020Kb

Gelato Pro 2.0 and Gelato 2.0 Gpu-Accelerated Final-Frame Renderer GELATO PRO 2.0 AND GELATO 2.0 GPU-ACCELERATED FINAL-FRAME RENDERER NVIDIA Sorbetto™ is a feature set Speed Without Gelato / Gelato Pro Sacrificing Quality that allows you to change any lighting parameter you wish and re-render the Selected Feature Comparison NVIDIA® Gelato® is built on the principle frame in record time. Best of all, Sorbetto Gelato of never compromising on the quality of performs this relighting on the final pixels. Features Gelato Pro the rendered image. Gelato uses NVIDIA The image you work with in Sorbetto is GPU acceleration graphics processing units (GPUs) as always identical to the final rendered image a general purpose GPU and not just and you do not need to alter or specially Highest quality images as a way to fill pixels on the display. It prepare your scene to use Sorbetto. renders images quickly and is robust Raytracing, incl. global Sorbetto uses the same geometry, and powerful enough to render the most illumination and ambient shaders, procedurals, and hardware complex scenes while avoiding the quality occlusion as a regular Gelato render. Sorbetto limitations of hardware-generated images. functionality is exposed in the Maya plug- High-order geometry support Gelato seamlessly leverages the advances in (and soon in the 3ds Max plug-in) and found in the latest GPU hardware while in the C++ API, so it can be used with any Fully programmable shading protecting your investment in production existing lighting tool in your pipeline. Sorbetto interactive tools and digital assets by insulating them Best of all, Gelato Pro is backed by the relighting from hardware changes. top rendering engineers in the industry. DSO shadeops Gelato is designed for easy integration NVIDIA’s High-Quality Rendering Team into any workflow and production pipeline has decades of experience in film Multithreading and can be used by both the technically production and in developing renderers sophisticated and by the artistically inclined. used to create stunning images for movies Network parallel rendering It ships with simple, yet powerful, C++ and and TV. Gelato Pro comes with a support Python APIs, as well as with the plug-ins for package that includes all product updates Native 64-bit support Autodesk® Maya® and 3ds Max®. and upgrades. Comprehensive support package Render Everywhere NVIDIA Gelato is available in two versions, NVIDIA Gelato and NVIDIA Gelato Pro. NVIDIA Gelato 2.0 is available at no charge. Simply download the software and you will have one of the most high-powered and sophisticated renderers available. Geared for the professionals with larger- scale pipelines or projects, Gelato Pro 2.0 is available for purchase. It includes the core Gelato renderer, plus Sorbetto™ interactive relighting, DSO shadeops, multithreading, and network parallel rendering. NVIDIA GELATO PRO | PRODUCT OVERVIEW | APRIL06 NVIDIA Gelato 2.0 Features and Specifications Superb Image Quality • Antialiased texture, environment, Components • Unlimited image resolution and shadow maps • Gelato rendering engine • Unlimited antialiasing • Atmospheric effects • C++ API • Volume shadows • Unlimited number of lights • Plug-ins for scene file formats: • High-quality motion blur • Global illumination ° Python scene description • Depth of field • Ambient occlusion ° Gelato byte-stream scene description • Automatic adaptive tessellation • Ray-traced reflections, refractions, ° RIB scene file I/O plug-in • Fast, high-quality raytracing shadows (available as 3rd party freeware) • Image output: 8-bit, 16-bit, and float • Average-Z shadow maps • Mango™ plug-in for Autodesk Maya • Output image channels for any value • Cube-face shadow maps • Frantic Films’ Amaretto™ plug-in for computed in shader • Sample shaders and shader function Autodesk 3ds Max library Geometry Support • Wide variety of image and texture I/O plug-ins • NURBS, bicubic, and bilinear patches High Performance • Tools: • Polygon meshes • GPU-accelerated Image viewer • Subdivision surfaces • Efficient handling of complex scenes ° Shader compiler • Curves (hair) • Efficient memory use ° Image files to textures converter • Wide curves (feathers and ribbons) ° Production-ready ° Shader developer libraries • Particles • Fully selective lighting • Procedural geometry plug-ins System Requirements • Preview mode • NVIDIA GPU, one of the following: • Arbitrary user-specified vertex variables • Holdout matte objects Programmable Shading and Lighting ° NVIDIA Quadro FX • Fast stereo rendering NVIDIA GeForce® 5200 or higher • Caustics ° • No eyesplits • Microsoft® Windows® XP or • Subsurface scattering • User-priority bucket order (spiral/crop Linux 2.4 kernel or later • True displacement on all primitive types window) • 1 GB RAM (recommended) • Layered surface and light shaders • Interleave utility (field rendering) • Open, documented, royalty-free formats NVIDIA Gelato Pro 2.0 Features and Specifications All Gelato 2.0 Features and Specifications, Plus: Programmable Shading and Lighting • Adjust lighting Production-ready • DSO Shadeops ° Add/delete lights • Network parallel rendering ° Move/reorient lights • Floating or node-locked licenses NVIDIA Sorbetto Interactive Relighting ° Change any light shader parameter • Cross-platform licensing • Rapidly recompute changes to lighting ° Change light linking • Unparalleled support from NVIDIA’s • All Sorbetto functions exposed in the • Recompute reflections and shadows High-Quality Rendering Team Gelato API; not dependent on any automatically High Performance particular modeling or animation software • Selective relighting: recompute lighting • Multithreaded • Relighting on “final pixels,” including for a crop window or specified object • Native 64-bit support full antialiasing, motion blur, for even faster results transparency, displacement, and • Interruptible—make changes on the fly System Requirements production shaders. Always identical to before the last render is finished • NVIDIA Quadro FX graphics board the final rendered image. • Supported by Mango™ plug-in for Maya • 2 GB RAM (recommended) and soon by Amaretto™ plug-in for 3ds Max To inquire about Gelato Pro 2.0 or download Gelato 2.0 at no charge, Where to buy NVIDIA Gelato please visit www.nvidia.com/get_gelato NVIDIA Corporation | www.nvidia.com © 2006 NVIDIA Corporation. NVIDIA, the NVIDIA logo, NVIDIA Quadro, Gelato, GeForce, Sorbetto, and Mango are trademarks and/ or registered trademarks of NVIDIA All rights reserved. All company and product names are trademarks or registered trademarks of the respective owners with which they are associated. Features, pricing, availability, and specifications are all subject to change without notice..
Recommended publications
  • Conservation Cores: Reducing the Energy of Mature Computations
    Conservation Cores: Reducing the Energy of Mature Computations Ganesh Venkatesh Jack Sampson Nathan Goulding Saturnino Garcia Vladyslav Bryksin Jose Lugo-Martinez Steven Swanson Michael Bedford Taylor Department of Computer Science & Engineering University of California, San Diego fgvenkatesh,jsampson,ngouldin,sat,vbryksin,jlugomar,swanson,[email protected] Abstract power. Consequently, the rate at which we can switch transistors Growing transistor counts, limited power budgets, and the break- is far outpacing our ability to dissipate the heat created by those down of voltage scaling are currently conspiring to create a utiliza- transistors. tion wall that limits the fraction of a chip that can run at full speed The result is a technology-imposed utilization wall that limits at one time. In this regime, specialized, energy-efficient processors the fraction of the chip we can use at full speed at one time. Our experiments with a 45 nm TSMC process show that we can can increase parallelism by reducing the per-computation power re- 2 quirements and allowing more computations to execute under the switch less than 7% of a 300mm die at full frequency within an same power budget. To pursue this goal, this paper introduces con- 80W power budget. ITRS roadmap projections and CMOS scaling servation cores. Conservation cores, or c-cores, are specialized pro- theory suggests that this percentage will decrease to less than 3.5% cessors that focus on reducing energy and energy-delay instead of in 32 nm, and will continue to decrease by almost half with each increasing performance. This focus on energy makes c-cores an ex- process generation—and even further with 3-D integration.
    [Show full text]
  • Nvidia® Gelato™ 1.0 Hardware
    NVIDIA GELATO PRODUCT OVERVIEW APRIL04v01 NVIDIA® GELATO™ 1.0 HARDWARE- Key to this doctrine of no compromises is Gelato’s new shading language incorporates a ACCELERATED FINAL-FRAME RENDERER Gelato’s use of NVIDIA graphics hardware. simple and streamlined syntax based on C, Gelato is breakthrough, rendering software Gelato uses the NVIDIA Quadro FX as a second making it familiar and easy for most from NVIDIA, designed with a new architecture floating-point processor, taking advantage of programmers to learn and allowing for state- that leverages advances in mainstream graphics the 3D engine in ways far beyond gameplay. of-the-art shader-specific types and hardware to accelerate film-quality rendering. Gelato is one of the first in a wave of software functions. Gelato ships with an extensive This software renderer takes advantage of the applications that use the graphics hardware as set of shader libraries and examples. programmability, precision, performance, and an off-line processor, a “floating-point Gelato is available with a world-class quality of NVIDIA Quadro® FX professional supercomputer on a chip,” and not simply to support package, backed by NVIDIA, graphics solutions to render imagery of manage the display. the global leader in 3D graphics. uncompromising quality at unheard-of speeds. FAST AND GETTING FASTER The annual support package includes Gelato offers all the features film and television all product updates and upgrades. customers demand today and is flexible and Gelato unleashes the processing power of the extensible enough to satisfy their future graphics hardware that currently sits idle on LOOKING TO THE FUTURE requirements.
    [Show full text]
  • FCM 61 Italiano
    Full Circle LA RIVISTA INDIPENDENTE PER LA COMUNITÀ LINUX UBUNTU Numero #61 - Maggio 2012 AUDIO FLUX NUOVA SEZIONE MUSICA GRATIS IN CC foto: downhilldom1984 (Flickr.com) CCOOPPIIAA EE CCOODDIIFFIICCAA DDII DDVVDD QQUUAATTTTRROO SSIISSTTEEMMII CCRROONNOOMMEETTRRAATTII EE PPRROOVVAATTII full circle magazine n.61 1 Full Circle magazine non è affiliata né sostenuta da Canonical Ltd. indice ^ HowTo Full Circle Opinioni LA RIVISTA INDIPENDENTE PER LA COMUNITÀ LINUX UBUNTU Python-Parte33 p.07 Rubriche LaMiaStoria p.38 UsareilcomandoTOP p.10 NotizieLinux p.04 AudioFlux p.52 LaMiaOpinione p.42 VirtualBoxNetworking p.15 Comanda&Conquista p.05 GiochiUbuntu p.53 IoPensoChe... p.43 GIMP-BeanstalkParte 2 p.21 LinuxLabs p.29 D&R p.50 RecensioneLibro p.45 Torna Prossimo Mese Inkscape-Parte1 p.24 DonneUbuntu p.XX Chiuderele«Finestre» p.32 Lettere p.46 Grafica Gli articoli contenuti in questa rivista sono stati rilasciati sotto la licenza Creative Commons Attribuzione - Non commerciale - Condividi allo stesso modo 3.0. Ciò significa che potete adattare, copiare, distribuire e inviare gli articoli ma solo sotto le seguenti condizioni: dovete attribuire il lavoro all'autore originale in una qualche forma (almeno un nome, un'email o un indirizzo Internet) e a questa rivista col suo nome ("Full Circle Magazine") e con il suo indirizzo Internet www.fullcirclemagazine.org (ma non attribuire il/gli articolo/i in alcun modo che lasci intendere che gli autori e la rivista abbiano esplicitamente autorizzato voi o l'uso che fate dell'opera). Se alterate, trasformate o create un'opera su questo lavoro dovete distribuire il lavoro risultante con la stessa licenza o una simile o compatibile.
    [Show full text]
  • Geforce 9500 Gt Drivers Windows 7 64-Bit Download GEFORCE 9500 GT ZOTAC WINDOWS 8.1 DRIVER
    geforce 9500 gt drivers windows 7 64-bit download GEFORCE 9500 GT ZOTAC WINDOWS 8.1 DRIVER. Compare smartphones, cameras, headphones, graphics cards, and much more. Gpu Cooler With High-speed Fan For nvidia Tesla K80 P100 V100 Passive Cooling. Enabled 3D performance people are available and GPU. 21705. News Search Results. Game Ready Drivers provide the best possible gaming experience for all major new releases, including Virtual Reality games. 8,599, and estimated average price is Rs. The GeForce GTX 10 Series has been superseded by the revolutionary NVIDIA Turing architecture in the GTX 16 Series and RTX 20 Series. Fan For nvidia Tesla K80 P100 V100 Passive Cooling. GTA 4 SECRET CAR RS SULTAN SECRET LOCATION WATCH THIS. Performance Boost Increases performance by up to 19% for GeForce 400/500/600/700 series GPUs in several PC games vs. 2 was a graphics card by NVIDIA, launched in July 2008. GeForce Experience automatically notifies you when these drivers are available and, with a single click, lets you update to the latest driver without leaving your desktop. NVIDIA GeForce 9300 GS, the latest driver update. RTX graphics cards are optimized for your favorite streaming apps to provide maximum performance for your live stream. Price is likely to the best 636 GPUs. Nekretnine i zemljišta keyboard arrow right. Allows GeForce GTS 250 ECO 1GB. GeForce RTX 20 Series features a dedicated hardware encoder that unlocks the ability to game and stream simultaneously with superior quality. From Zotac Geforce 210 Geforce 210. GeForce 9200, our driver update. Top TV is a unique YouTube emissions and entertainment, but also a scientific nature through which you will learn many facts and learn about the phenomena for which you have not guessed existed.
    [Show full text]
  • The Utilization Wall
    UC San Diego UC San Diego Electronic Theses and Dissertations Title Configurable energy-efficient co-processors to scale the utilization wall Permalink https://escholarship.org/uc/item/3g99v4qd Author Venkatesh, Ganesh Publication Date 2011 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Configurable Energy-efficient Co-processors to Scale the Utilization Wall A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Ganesh Venkatesh Committee in charge: Professor Steven Swanson, Co-Chair Professor Michael Taylor, Co-Chair Professor Pamela Cosman Professor Rajesh Gupta Professor Dean Tullsen 2011 Copyright Ganesh Venkatesh, 2011 All rights reserved. The dissertation of Ganesh Venkatesh is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2011 iii DEDICATION To my dear parents and my loving wife. iv EPIGRAPH The lurking suspicion that something could be simplified is the world's richest source of rewarding challenges. |Edsger Dijkstra v TABLE OF CONTENTS Signature Page.................................. iii Dedication..................................... iv Epigraph.....................................v Table of Contents................................. vi List of Figures.................................. ix List of Tables................................... xi Acknowledgements...............................
    [Show full text]
  • 3D Computer Graphics Compiled By: H
    animation Charge-coupled device Charts on SO(3) chemistry chirality chromatic aberration chrominance Cinema 4D cinematography CinePaint Circle circumference ClanLib Class of the Titans clean room design Clifford algebra Clip Mapping Clipping (computer graphics) Clipping_(computer_graphics) Cocoa (API) CODE V collinear collision detection color color buffer comic book Comm. ACM Command & Conquer: Tiberian series Commutative operation Compact disc Comparison of Direct3D and OpenGL compiler Compiz complement (set theory) complex analysis complex number complex polygon Component Object Model composite pattern compositing Compression artifacts computationReverse computational Catmull-Clark fluid dynamics computational geometry subdivision Computational_geometry computed surface axial tomography Cel-shaded Computed tomography computer animation Computer Aided Design computerCg andprogramming video games Computer animation computer cluster computer display computer file computer game computer games computer generated image computer graphics Computer hardware Computer History Museum Computer keyboard Computer mouse computer program Computer programming computer science computer software computer storage Computer-aided design Computer-aided design#Capabilities computer-aided manufacturing computer-generated imagery concave cone (solid)language Cone tracing Conjugacy_class#Conjugacy_as_group_action Clipmap COLLADA consortium constraints Comparison Constructive solid geometry of continuous Direct3D function contrast ratioand conversion OpenGL between
    [Show full text]
  • Wen-Mei William Hwu
    Wen-mei William Hwu PERSONAL INFORMATION Office: Home: Coordinated Science Laboratory 2709 Bayhill Drive 1308 West Main Street, Champaign, Illinois, 61822-7988 Urbana, Illinois, 61801-2307 (217) 359-8984 (217) 244-8270 (217) 333-5579 (FAX) Email: [email protected] EDUCATION Ph.D., Computer Science,1987, University of California, Berkeley B.S., Electrical Engineering, 1983, National Taiwan University, Taiwan CURRENT POSITION Professor and Sanders III Advanced Micro Devices, Inc., Endowed Chair, Electrical and Computer Engineering; Research Professor of Coordinated Science Laboratory, University of Illinois, Urbana-Champaign (UIUC). Chief Technology Officer and Co-Founder, MulticoreWare, Sunnnyvale, California, St. Louis, Missouri, Champaign, Illinois, Chennai, India, Chang-Chun and Beijing, China. Chief Scientist, Parallel Computing Institute, University of Illinois at Urbana-Champaign Board Member, Personify, Inc., Champaign, IL PROFESSIONAL EXPERIENCE September 2016 to present Co-Director (with Jinjun Xiong of IBM) of the IBM-Illinois Center for Cognitive Computing Systems Research, funded by IBM at a total of $8M for five years. The center funds a total of 30+ researchers working on hardware, software, and algorithms for building cognitive computing systems for innovative AI applications. June 2010 to present Co-Director (with Mateo Valero) of the PUMPS Summer School in Barcelona jointly offered by UIUC and the Universitat Politècnica de Catalunya. The summer school has been attended by about 100 faculty and graduate students worldwide every year to study the advanced parallel algorithm techniques for manycore computing systems. June 2008 to present Principle Investigator of the UIUC CUDA Center of Excellence, funded by NVIDIA at over $2.0 M in cash and equipment.
    [Show full text]
  • Release 343 Graphics Drivers for Windows, Version 344.48. RN
    Release 343 Graphics Drivers for Windows - Version 344.48 RN-W34448-01v02 | September 22, 2014 Windows Vista / Windows 7 / Windows 8 / Windows 8.1 Release Notes TABLE OF CONTENTS 1 Introduction to Release Notes ................................................... 1 Structure of the Document ........................................................ 1 Changes in this Edition ............................................................. 1 2 Release 343 Driver Changes ..................................................... 2 Version 344.48 Highlights .......................................................... 2 What’s New in Version 344.48 ................................................. 3 What’s New in Release 343..................................................... 5 Limitations in This Release ..................................................... 8 Advanced Driver Information ................................................. 10 Changes and Fixed Issues in Version 344.48.................................... 14 Open Issues in Version 344.48.................................................... 15 Windows Vista/Windows 7 32-bit Issues..................................... 15 Windows Vista/Windows 7 64-bit Issues..................................... 15 Windows 8 32-bit Issues........................................................ 17 Windows 8 64-bit Issues........................................................ 17 Windows 8.1 Issues ............................................................. 18 Not NVIDIA Issues..................................................................
    [Show full text]
  • VMD User's Guide
    VMD User’s Guide Version 1.9.4a48 October 13, 2020 NIH Biomedical Research Center for Macromolecular Modeling and Bioinformatics Theoretical and Computational Biophysics Group1 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign 405 N. Mathews Urbana, IL 61801 http://www.ks.uiuc.edu/Research/vmd/ Description The VMD User’s Guide describes how to run and use the molecular visualization and analysis program VMD. This guide documents the user interfaces displaying and grapically manipulating molecules, and describes how to use the scripting interfaces for analysis and to customize the behavior of VMD. VMD development is supported by the National Institutes of Health grant numbers NIH P41- GM104601. 1http://www.ks.uiuc.edu/ Contents 1 Introduction 11 1.1 Contactingtheauthors. ....... 12 1.2 RegisteringVMD.................................. 12 1.3 CitationReference ............................... ...... 12 1.4 Acknowledgments................................. ..... 13 1.5 Copyright and Disclaimer Notices . .......... 13 1.6 For information on our other software . .......... 15 2 Hardware and Software Requirements 17 2.1 Basic Hardware and Software Requirements . ........... 17 2.2 Multi-core CPUs and GPU Acceleration . ......... 17 2.3 Parallel Computing on Clusters and Supercomputers . .............. 18 3 Tutorials 19 3.1 RapidIntroductiontoVMD. ...... 19 3.2 Viewing a molecule: Myoglobin . ........ 19 3.3 RenderinganImage ................................ 21 3.4 AQuickAnimation................................. 21 3.5 An Introduction to Atom Selection . ......... 22 3.6 ComparingTwoStructures . ...... 22 3.7 SomeNiceRepresenations . ....... 23 3.8 Savingyourwork.................................. 24 3.9 Tracking Script Command Versions of the GUI Actions . ............ 24 4 Loading A Molecule 26 4.1 Notes on common molecular file formats . ......... 26 4.2 Whathappenswhenafileisloaded? . ....... 27 4.3 Babelinterface .................................
    [Show full text]
  • Nvidia Dgx Os 5.0
    NVIDIA DGX OS 5.0 Release Notes for Version 5.0.0 DA-08260-500_v06 | February 2021 Table of Contents NVIDIA DGX OS 5.0, Version 5.0.0, Release Notes................................................. 3 About Release 5.0 .....................................................................................................3 Delivery and Update Mechanisms ..............................................................................4 Initial 5.0 Release ..................................................................................................4 Update Advisement ...................................................................................................4 Version History .........................................................................................................6 DGX OS 5.0 Software Content ....................................................................................7 Package Versions in Version 5.0.0 ..........................................................................7 DGX Server Firmware Version Reference ...................................................................7 Downgrading Firmware for Mellanox ConnectX-4 Cards .............................................8 Checking the Device Type.......................................................................................8 Downgrading the Firmware ....................................................................................9 Known Issues .........................................................................................................
    [Show full text]
  • C 2009 Aqeel A. Mahesri TRADEOFFS in DESIGNING MASSIVELY PARALLEL ACCELERATOR ARCHITECTURES
    c 2009 Aqeel A. Mahesri TRADEOFFS IN DESIGNING MASSIVELY PARALLEL ACCELERATOR ARCHITECTURES BY AQEEL A. MAHESRI B.S., University of California at Berkeley, 2002 M.S., University of Illinois at Urbana-Champaign, 2004 DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2009 Urbana, Illinois Doctoral Committee: Associate Professor Sanjay J. Patel, Chair Professor Josep Torrellas Professor Wen-Mei Hwu Assistant Professor Craig Zilles ABSTRACT There is a large, emerging, and commercially relevant class of applications which stands to be enabled by a significant increase in parallel computing throughput. Moreover, continued scaling of semiconductor technology allows us the creation of architectures with tremendous throughput on a single chip. In this thesis, we examine the confluence of these emerging single-chip accelerators and the appli- cations they enable. We examine the tradeoffs associated with accelerator archi- tectures, working our way down the abstraction hierarchy of computing starting at the application level and concluding with the physical design of the circuits. Research into accelerator architectures is hampered by the lack of standard- ized, readily available benchmarks. Among these applications is what we refer to as visualization, interaction, and simulation (VIS). These applications are ide- ally suited for accelerators because of their parallelizability and demand for high throughput. We present VISBench, a benchmark suite to serve as an experimen- tal proxy for for VIS applications. VISBench contains a sampling of applications and application kernels from traditional visual computing areas such as graphics rendering and video encoding.
    [Show full text]
  • Enabling Compute-Communication Overlap in Distributed Deep
    Enabling Compute-Communication Overlap in Distributed Deep Learning Training Platforms Saeed Rashidi∗, Matthew Denton∗, Srinivas Sridharan†, Sudarshan Srinivasan‡, Amoghavarsha Suresh§, Jade Nie†, and Tushar Krishna∗ ∗Georgia Institute of Technology, Atlanta, USA †Facebook, Menlo Park, USA ‡Intel, Bangalore, India §Stony Brook University, Stony Brook, USA [email protected], [email protected], [email protected], [email protected] Abstract—Deep Learning (DL) training platforms are built From other server nodes by interconnecting multiple DL accelerators (e.g., GPU/TPU) Datacenter Network 1) Mellanox 2) Barefoot via fast, customized interconnects with 100s of gigabytes (GBs) 3) Inception of bandwidth. However, as we identify in this work, driving CPU CPU 4) sPIN this bandwidth is quite challenging. This is because there is a 5) Triggered- pernicious balance between using the accelerator’s compute and Based memory for both DL computations and communication. PCIe PCIe PCIe PCIe This work makes two key contributions. First, via real system Switch Switch Switch Switch measurements and detailed modeling, we provide an under- A 7) ACE A A standing of compute and memory bandwidth demands for DL NPU 0 F NPU 1 AFI F NPU 4 F NPU 5 I I I compute and comms. Second, we propose a novel DL collective ACE Accelerator communication accelerator called Accelerator Collectives Engine Fabric (AF) (ACE) that sits alongside the compute and networking engines at A A A A NPU 2 F NPU 3 F F NPU 6 F NPU 7 the accelerator endpoint. ACE frees up the endpoint’s compute I I 6) NVIDIA I I Switch-Based and memory resources for DL compute, which in turn reduces Datacenter AFI network PCIe link NIC the required memory BW by 3.5× on average to drive the same network link link network BW compared to state-of-the-art baselines.
    [Show full text]