Linear Algebra Package on the NVIDIA G80 Processor

Total Page:16

File Type:pdf, Size:1020Kb

Linear Algebra Package on the NVIDIA G80 Processor Linear Algebra PACKage on the NVIDIA G80 Processor Robert Liao and Tracy Wang Computer Science 252 Project University of California, Berkeley Berkeley, CA 94720 liao_r [at] berkeley.edu, tracyx [at] berkeley.edu Abstract primarily on running instructions faster. This brought forth the frequency race between The race for performance improvement microprocessor manufacturers. Programmers previously depended on running one application could write code, wait 6 months, and their code or sequence of codereally quickly. Now, the would suddenly run faster. Additionally, many processor industry and universities focus on techniques for increasing instruction level running many things really quickly at the same parallelism were introduced to improve time in an idea collectively known as parallelism. performance. Today, the next performance Traditional microprocessors achieve this by obstacle is not clear, but the general consensus is building more cores onto the architecture. A single that the next big thing includes putting many processor may have as many as 2 to 8 cores that processing cores on a processor. This will enable can execute independently of one another. applications to take advantage of some higher However, many of these processor systems form of parallelism beyond instruction parallelism. include an underutilized graphics processing unit. NVIDIA’s G80 Processor represents the first step During all of this time, Graphics to improve performance of an application through Processing Units (GPUs) have been doing many the use of the inherent parallel structure of a operations in parallel due to the relatively graphics processing unit. independent nature of its computations. Graphics scenes can be decomposed into objects, which can This paper explores the performance of a be decomposed into various rendering steps that subset of the Linear Algebra Package running on are independent from one another. However, this the NVIDIA G80 Processor. Results from the led to a very specialized processor that was exploration show that if utilized properly, the optimized for rendering. performance of linear algebra operations is improved by a factor of 70 with a suitable input Performing computations on a GPU is a size. Additionally, the paper discusses the issues relatively new field full of opportunities thanks to involved with running general programs on the the NVIDIA G80 GPU. This processor represents GPU, a relatively new ability provided by the G80 one of the first processors to expose as many as Processor. Finally, the paper discusses limitations 128 computation cores to the programmer. As a in terms of the use of a GPU. result, a comparison with respect to the CPU to see how the GPU performs is more than 1 Introduction appropriate to determine if this is right direction for parallelism. The effort to improve performance on microprocessors for much of the 1990s focused Though any program can be loaded onto the Central Processing Unit (CPU). The the GPU for this exploration, we decided to relationship between the CPU and GPU is closely benchmark linear algebra operations as a starting intertwined as most of the output from a point to evaluate the GPU for other research areas computer is received through video. In the past, like the Berkeley View. The Berkeley View has architects optimized the CPU and GPU bus routes compiled many kernels called dwarves that they because of the high demand to display video to an think should perform well on parallel platforms. output device like a monitor or LCD. These include dense and sparse matrix operations. Many programs can be reduced purely to these GPU manufacturers have kept up with the dwarves. As a starting point to examining efficacy demand by offering more advanced capabilities in of using the GPU in this capacity, we benchmark their GPUs beyond putting text and windows to a the performance of general linear algebra screen. GPUs today are typically capable of taking operations. geometric information in the form of polygons from an application like a game and performing This paper is organized as follows. Section many different transformations to provide some 2 provides a general introduction to traditional sort of realistic or artistic output. GPUs and how they worked prior to the NVIDIA G80 processor. Section 3 discusses details about This video processing is embarrassingly the NVIDIA G80 processor and outlines its parallel. The representation of a pixel on the capabilities. Section 4 discusses the benchmarks screen often can be rendered independently of used to profile the performance of the G80 with other pixels. As a result, GPU manufacturers have respect to two CPUs used in this paper. Section 5 provided many superscalar features in their provides results, discussion, and speculation on processors to take advantage of this parallelism. the performance runs on the GPU. Section 6 brings This push for parallelism has come to a point forth the issues associated with GPU computing where a GPU is basically a specialized vector along with a discussion on issues on running processor. applications on the G80 platform. Finally, the paper concludes in Section 7 with a summary of Motivation for Change the results, future directions for research, as well A fixed pipeline characterizes the as related works on GPUs. traditional GPU. Many have a fixed number of special shaders such as vertex shaders and pixel 2 Traditional GPUs shaders. NVIDIA noticed that during certain rendering scenarios, many of the specialized Background shaders remain dormant. For instance, a scene with many geometric features will use many Many modern personal computing and vertex shaders, but not very many pixel shaders. workstation architectures include a GPU to off- As a result, NVIDIA began to look for a load the task of rendering graphical objects from reconfigurable solution. Figure 1 shows a typical high-altitude pipeline of a GPU. Data flows forward from the CPU through the GPU and ultimately on to the display. GPUs typically contain many of these pipelines to process scenes in parallel. Additionally, the pipeline is designed to flow Figure 1: The Traditional GPU Pipeline forward, and as a result, certain stages of the pipeline have features like write-only registers to Each processor also has local memory as avoid hazards like read after write hazards found well as shared memory with other processors. in typical CPU pipelines. According to the NVIDIA guide, accessing local and shared memory on-chip is as fast as accessing Additionally, the vector processor right registers. next to the CPU is quiet during heavy computations performed on the CPU. Most Compute Unified Device Architecture developers do not send parallelizable The Compute Unified Device Architecture computations to the GPU because the APIs make (CUDA) is NVIDIA’s API for exposing the it too difficult to do so. The typical interfaces like processing features of the G80 GPU. This C OpenGL and DirectX are designed for graphics, Language API provides services ranging from not computation. As a result, the programmer common GPU operations in the CUDA Library to cannot tap into the GPU’s vast vector resources. traditional C memory management semantics in the CUDA runtime and device driver layers. 3 The NVIDIA G80 GPU Additionally, NVIDIA provides a specialized C The G80 GPU is found in NVIDIA’s compiler to build programs targeted for the GPU. GeForce 8 Series graphics cards as well as the NVIDIA Quadro FX 4600 and 5600. The NVIDIA Code compiled for the GPU is executed on Quadro FX 5600 is the card used in this the GPU. Likewise, memory allocated on the GPU exploration. resides on the GPU. This introduces complications in interfacing programs running in CPU space Architecture with programs running in GPU space. The programmer must keep track of the pointers used The G80 GPU is NVIDIA’s answer to in each processor. Many programs, including the many of the aforementioned concerns and issues. ones used to benchmark the GPU in this paper, It represents a large departure from traditional reasonably assume that all pointers and execution GPU architectures. A block diagram of the code reside on one memory space and one architecture is shown in Figure 2. The GPU execution unit. Porting this style of programming contains 8 blocks of 16 stream processors with a to a separate format is a non-trivial task. total of 128 stream processors. Each stream processor can execute floating point instructions. From the block below, each group of 16 shares a L1 cache. From there, each block has access to 6 L2 caches. This architecture arrangement also allows one processor to directly feed results into another processor for continued stream processing. Each processor can be configured to be a part of some shader unit in the traditional GPU sense. This reconfigurability also means that the processors can be dedicated to performing general computations. This capability is exposed in NVIDIA’s Compute Unified Device Architecture. Figure 2: The NVIDIA G80 Graphics Processor Architecture quickly LAPACK can perform this factorization with respect to matrix size in both the CPU and GPU. BLAS and CUBLAS The LAPACK tools rely on the Basic Linear Algebra Subprograms (BLAS) library. Figure 3: Organization of the modules. These subprograms are a set of primitive operations that operate on matrices. The original A Note on Scarce Specifications BLAS can be run on the CPU. NVIDIA provides its Due to the secretive nature of the industry, own version called CUBLAS (Compute Unified NVIDIA has not released much information about BLAS). CUBLAS is designed to run on the G80 the G80 processor beyond a high level overview. GPU, and abstracts much of the CUDA As a result, we can only speculate on specific like programming API in a succinct mathematical L1 and L2 cache sizes in this benchmark. package. The only major change is the inclusion of allocation and freeing function to deal with the 4Benchmarking separation of CPU and GPU memory.
Recommended publications
  • Gs-35F-4677G
    March 2013 NCS Technologies, Inc. Information Technology (IT) Schedule Contract Number: GS-35F-4677G FEDERAL ACQUISTIION SERVICE INFORMATION TECHNOLOGY SCHEDULE PRICELIST GENERAL PURPOSE COMMERCIAL INFORMATION TECHNOLOGY EQUIPMENT Special Item No. 132-8 Purchase of Hardware 132-8 PURCHASE OF EQUIPMENT FSC CLASS 7010 – SYSTEM CONFIGURATION 1. End User Computer / Desktop 2. Professional Workstation 3. Server 4. Laptop / Portable / Notebook FSC CLASS 7-25 – INPUT/OUTPUT AND STORAGE DEVICES 1. Display 2. Network Equipment 3. Storage Devices including Magnetic Storage, Magnetic Tape and Optical Disk NCS TECHNOLOGIES, INC. 7669 Limestone Drive Gainesville, VA 20155-4038 Tel: (703) 621-1700 Fax: (703) 621-1701 Website: www.ncst.com Contract Number: GS-35F-4677G – Option Year 3 Period Covered by Contract: May 15, 1997 through May 14, 2017 GENERAL SERVICE ADMINISTRATION FEDERAL ACQUISTIION SERVICE Products and ordering information in this Authorized FAS IT Schedule Price List is also available on the GSA Advantage! System. Agencies can browse GSA Advantage! By accessing GSA’s Home Page via Internet at www.gsa.gov. TABLE OF CONTENTS INFORMATION FOR ORDERING OFFICES ............................................................................................................................................................................................................................... TC-1 SPECIAL NOTICE TO AGENCIES – SMALL BUSINESS PARTICIPATION 1. Geographical Scope of Contract .............................................................................................................................................................................................................................
    [Show full text]
  • Powervr SGX Series5xt IP Core Family
    PowerVR SGX Series5XT IP Core Family The PowerVR™ SGX Series5XT Graphics Processing Unit (GPU) IP core family is a series Features of highly efficient graphics acceleration IP cores that meet the multimedia requirements of • Most comprehensive IP core family the next generation of consumer, communications and computing applications. and roadmap in the industry PowerVR SGX Series5XT architecture is fully scalable for a wide range of area and • USSE2 delivers twice the peak performance requirements, enabling it to target markets from low cost feature-rich mobile floating point and instruction multimedia products to very high performance consoles and computing devices. throughput of Series5 USSE • YUV and colour space accelerators The family incorporates the second-generation Universal Scalable Shader Engine (USSE2™), for improved performance with a feature set that exceeds the requirements of OpenGL 2.0 and Microsoft Shader • Upgraded PowerVR Series5XT Model 3, enabling 2D, 3D and general purpose (GP-GPU) processing in a single core. shader-driven tile-based deferred rendering (TBDR) architecture • Multi-processor options enable scalability to higher performance • Support for all industry standard PowerVR SGX Family mobile and desktop graphics APIs and operating sytems Series5XT SGX543MP1-16, SGX544MP1-16, SGX554MP1-16 • Fully backwards compatible with PowerVR MBX and SGX Series5 Series5 SGX520, SGX530, SGX531, SGX535, SGX540, SGX545 Benefits Multi-standard API and OS • Extensive product line supports all area/performance requirements OpenGL
    [Show full text]
  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
    Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 1 of 34 1 BOIES, SCHILLER & FLEXNER LLP WILLIAM A. ISAACSON (pro hac vice) 2 5301 Wisconsin Ave. NW, Suite 800 Washington, D.C. 20015 3 Telephone: (202) 237-2727 Facsimile: (202) 237-6131 4 Email: [email protected] 5 6 BOIES, SCHILLER & FLEXNER LLP BOIES, SCHILLER & FLEXNER LLP JOHN F. COVE, JR. (CA Bar No. 212213) PHILIP J. IOVIENO (pro hac vice) 7 DAVID W. SHAPIRO (CA Bar No. 219265) ANNE M. NARDACCI (pro hac vice) KEVIN J. BARRY (CA Bar No. 229748) 10 North Pearl Street 8 1999 Harrison St., Suite 900 4th Floor Oakland, CA 94612 Albany, NY 12207 9 Telephone: (510) 874-1000 Telephone: (518) 434-0600 Facsimile: (510) 874-1460 Facsimile: (518) 434-0665 10 Email: [email protected] Email: [email protected] [email protected] [email protected] 11 [email protected] 12 Attorneys for Plaintiff Jordan Walker Interim Class Counsel for Direct Purchaser 13 Plaintiffs 14 15 UNITED STATES DISTRICT COURT 16 NORTHERN DISTRICT OF CALIFORNIA 17 18 IN RE GRAPHICS PROCESSING UNITS ) Case No.: M:07-CV-01826-WHA ANTITRUST LITIGATION ) 19 ) MDL No. 1826 ) 20 This Document Relates to: ) THIRD CONSOLIDATED AND ALL DIRECT PURCHASER ACTIONS ) AMENDED CLASS ACTION 21 ) COMPLAINT FOR VIOLATION OF ) SECTION 1 OF THE SHERMAN ACT, 15 22 ) U.S.C. § 1 23 ) ) 24 ) ) JURY TRIAL DEMANDED 25 ) ) 26 ) ) 27 ) 28 THIRD CONSOLIDATED AND AMENDED CLASS ACTION COMPLAINT BY DIRECT PURCHASERS M:07-CV-01826-WHA Case M:07-cv-01826-WHA Document 249 Filed 11/08/2007 Page 2 of 34 1 Plaintiffs Jordan Walker, Michael Bensignor, d/b/a Mike’s Computer Services, Fred 2 Williams, and Karol Juskiewicz, on behalf of themselves and all others similarly situated in the 3 United States, bring this action for damages and injunctive relief under the federal antitrust laws 4 against Defendants named herein, demanding trial by jury, and complaining and alleging as 5 follows: 6 NATURE OF THE CASE 7 1.
    [Show full text]
  • Data Sheet: Quadro GV100
    REINVENTING THE WORKSTATION WITH REAL-TIME RAY TRACING AND AI NVIDIA QUADRO GV100 The Power To Accelerate AI- FEATURES > Four DisplayPort 1.4 Enhanced Workflows Connectors3 The NVIDIA® Quadro® GV100 reinvents the workstation > DisplayPort with Audio to meet the demands of AI-enhanced design and > 3D Stereo Support with Stereo Connector3 visualization workflows. It’s powered by NVIDIA Volta, > NVIDIA GPUDirect™ Support delivering extreme memory capacity, scalability, and > NVIDIA NVLink Support1 performance that designers, architects, and scientists > Quadro Sync II4 Compatibility need to create, build, and solve the impossible. > NVIDIA nView® Desktop SPECIFICATIONS Management Software GPU Memory 32 GB HBM2 Supercharge Rendering with AI > HDCP 2.2 Support Memory Interface 4096-bit > Work with full fidelity, massive datasets 5 > NVIDIA Mosaic Memory Bandwidth Up to 870 GB/s > Enjoy fluid visual interactivity with AI-accelerated > Dedicated hardware video denoising encode and decode engines6 ECC Yes NVIDIA CUDA Cores 5,120 Bring Optimal Designs to Market Faster > Work with higher fidelity CAE simulation models NVIDIA Tensor Cores 640 > Explore more design options with faster solver Double-Precision Performance 7.4 TFLOPS performance Single-Precision Performance 14.8 TFLOPS Enjoy Ultimate Immersive Experiences Tensor Performance 118.5 TFLOPS > Work with complex, photoreal datasets in VR NVIDIA NVLink Connects 2 Quadro GV100 GPUs2 > Enjoy optimal NVIDIA Holodeck experience NVIDIA NVLink bandwidth 200 GB/s Realize New Opportunities with AI
    [Show full text]
  • Driver Riva Tnt2 64
    Driver riva tnt2 64 click here to download The following products are supported by the drivers: TNT2 TNT2 Pro TNT2 Ultra TNT2 Model 64 (M64) TNT2 Model 64 (M64) Pro Vanta Vanta LT GeForce. The NVIDIA TNT2™ was the first chipset to offer a bit frame buffer for better quality visuals at higher resolutions, bit color for TNT2 M64 Memory Speed. NVIDIA no longer provides hardware or software support for the NVIDIA Riva TNT GPU. The last Forceware unified display driver which. version now. NVIDIA RIVA TNT2 Model 64/Model 64 Pro is the first family of high performance. Drivers > Video & Graphic Cards. Feedback. NVIDIA RIVA TNT2 Model 64/Model 64 Pro: The first chipset to offer a bit frame buffer for better quality visuals Subcategory, Video Drivers. Update your computer's drivers using DriverMax, the free driver update tool - Display Adapters - NVIDIA - NVIDIA RIVA TNT2 Model 64/Model 64 Pro Computer. (In Windows 7 RC1 there was the build in TNT2 drivers). http://kemovitra. www.doorway.ru Use the links on this page to download the latest version of NVIDIA RIVA TNT2 Model 64/Model 64 Pro (Microsoft Corporation) drivers. All drivers available for. NVIDIA RIVA TNT2 Model 64/Model 64 Pro - Driver Download. Updating your drivers with Driver Alert can help your computer in a number of ways. From adding. Nvidia RIVA TNT2 M64 specs and specifications. Price comparisons for the Nvidia RIVA TNT2 M64 and also where to download RIVA TNT2 M64 drivers. Windows 7 and Windows Vista both fail to recognize the Nvidia Riva TNT2 ( Model64/Model 64 Pro) which means you are restricted to a low.
    [Show full text]
  • (GPU) Computing
    Graphics Processing Unit (GPU) computing This section describes the graphics processing unit (GPU) computing feature of OptiSystem. Note: The GPU computing feature is only configurable with OptiSystem Version 11 (or higher) What is GPU computing? GPU computing or GPGPU takes advantage of a computer’s grahics processing card to augment the speed of general purpose scientific and engineering computing tasks. Compute Unified Device Architecture (CUDA) implementation for OptiSystem NVIDIA revolutionized the GPGPU and accelerated computing when it introduced a new parallel computing architecture: Compute Unified Device Architecture (CUDA). CUDA is both a hardware and software architecture for issuing and managing computations within the GPU, thus allowing it to operate as a generic data-parallel computing device. CUDA allows the programmer to take advantage of the parallel computing power of an NVIDIA graphics card to perform general purpose computations. OptiSystem CUDA implementation The OptiSystem model for GPU computing involves using a central processing unit (CPU) and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. In the OptiSystem GPU programming model, the application has been modified to map the compute-intensive kernels to the GPU. The remainder of the application remains within the CPU. CUDA parallel computing architecture The NVIDIA CUDA parallel computing architecture is enabled on GeForce®, Quadro®, and Tesla™ products. Whereas GeForce and Quadro are designed for consumer graphics and professional visualization respectively, the Tesla product family is designed ground-up for parallel computing and offers exclusive computing 3 GRAPHICS PROCESSING UNIT (GPU) COMPUTING features, and is the recommended choice for the OptiSystem GPU.
    [Show full text]
  • NVIDIA Quadro P4000
    NVIDIA Quadro P4000 GP104 1792 112 64 8192 MB GDDR5 256 bit GRAPHICS PROCESSOR CORES TMUS ROPS MEMORY SIZE MEMORY TYPE BUS WIDTH The Quadro P4000 is a professional graphics card by NVIDIA, launched in February 2017. Built on the 16 nm process, and based on the GP104 graphics processor, the card supports DirectX 12.0. The GP104 graphics processor is a large chip with a die area of 314 mm² and 7,200 million transistors. Unlike the fully unlocked GeForce GTX 1080, which uses the same GPU but has all 2560 shaders enabled, NVIDIA has disabled some shading units on the Quadro P4000 to reach the product's target shader count. It features 1792 shading units, 112 texture mapping units and 64 ROPs. NVIDIA has placed 8,192 MB GDDR5 memory on the card, which are connected using a 256‐bit memory interface. The GPU is operating at a frequency of 1227 MHz, which can be boosted up to 1480 MHz, memory is running at 1502 MHz. We recommend the NVIDIA Quadro P4000 for gaming with highest details at resolutions up to, and including, 5760x1080. Being a single‐slot card, the NVIDIA Quadro P4000 draws power from 1x 6‐pin power connectors, with power draw rated at 105 W maximum. Display outputs include: 4x DisplayPort. Quadro P4000 is connected to the rest of the system using a PCIe 3.0 x16 interface. The card measures 241 mm in length, and features a single‐slot cooling solution. Graphics Processor Graphics Card GPU Name: GP104 Released: Feb 6th, 2017 Architecture: Pascal Production Active Status: Process Size: 16 nm Bus Interface: PCIe 3.0 x16 Transistors: 7,200
    [Show full text]
  • Nvidia Tesla P40 Gpu Accelerator
    NVIDIA TESLA P40 GPU ACCELERATOR HIGH-PERFORMANCE VIRTUAL GRAPHICS AND COMPUTE NVIDIA redefined visual computing by giving designers, engineers, scientists, and graphic artists the power to take on the biggest visualization challenges with immersive, interactive, photorealistic environments. NVIDIA® Quadro® Virtual Data GPU 1 NVIDIA Pascal GPU Center Workstation (Quadro vDWS) takes advantage of NVIDIA® CUDA Cores 3,840 Tesla® GPUs to deliver virtual workstations from the data center. Memory Size 24 GB GDDR5 H.264 1080p30 streams 24 Architects, engineers, and designers are now liberated from Max vGPU instances 24 (1 GB Profile) their desks and can access applications and data anywhere. vGPU Profiles 1 GB, 2 GB, 3 GB, 4 GB, 6 GB, 8 GB, 12 GB, 24 GB ® ® The NVIDIA Tesla P40 GPU accelerator works with NVIDIA Form Factor PCIe 3.0 Dual Slot Quadro vDWS software and is the first system to combine an (rack servers) Power 250 W enterprise-grade visual computing platform for simulation, Thermal Passive HPC rendering, and design with virtual applications, desktops, and workstations. This gives organizations the freedom to virtualize both complex visualization and compute (CUDA and OpenCL) workloads. The NVIDIA® Tesla® P40 taps into the industry-leading NVIDIA Pascal™ architecture to deliver up to twice the professional graphics performance of the NVIDIA® Tesla® M60 (Refer to Performance Graph). With 24 GB of framebuffer and 24 NVENC encoder sessions, it supports 24 virtual desktops (1 GB profile) or 12 virtual workstations (2 GB profile), providing the best end-user scalability per GPU. This powerful GPU also supports eight different user profiles, so virtual GPU resources can be efficiently provisioned to meet the needs of the user.
    [Show full text]
  • A Configurable General Purpose Graphics Processing Unit for Power, Performance, and Area Analysis
    Iowa State University Capstones, Theses and Creative Components Dissertations Summer 2019 A configurable general purpose graphics processing unit for power, performance, and area analysis Garrett Lies Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Digital Circuits Commons Recommended Citation Lies, Garrett, "A configurable general purpose graphics processing unit for power, performance, and area analysis" (2019). Creative Components. 329. https://lib.dr.iastate.edu/creativecomponents/329 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A Configurable General Purpose Graphics Processing Unit for Power, Performance, and Area Analysis by Garrett Joseph Lies A Creative Component submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Computer Engineering (Very-Large Scale Integration) Program of Study Committee: Joseph Zambreno, Major Professor The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation/thesis. The Graduate College will ensure this dissertation/thesis is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Garrett Joseph Lies, 2019. All rights reserved. ii TABLE OF CONTENTS Page LIST OF TABLES . iv LIST OF FIGURES . .v ACKNOWLEDGMENTS . vi ABSTRACT . vii CHAPTER 1.
    [Show full text]
  • 4 Reasons Why Pny Is the Right Choice Nvidia Quadro Rtx 8000
    4 REASONS WHY PNY IS THE RIGHT CHOICE GOVERNMENT AND DEFENSE PROGRAMS 1. PNY OFFERS SPECIAL GOVERNMENT PRICING PNY offers a special discount to all qualified government and educational customers on NVIDIA Quadro* professional graphics solutions. This discount is available through participating distributors only. To see if you qualify, contact your PNY account manager at [email protected]. 2. PNY OFFERS A FULL RANGE OF GPU PRODUCTS AND SOLUTIONS PNY offers a full line of professional GPU solutions to meet any project need, including the NVIDIA® Quadro® line of professional graphics solutions. NVIDIA Quadro is the world’s most advanced and trusted graphics accelerator of professional workflows. 3. PNY OFFERS PRODUCTS THAT ARE USED IN MANY GOVERNMENT AND PUBLIC SECTORS Whether it’s for CAD, Computation, Artificial Intelligence, Virtual Reality or even Scientific Visualization, our professional graphics solutions are certified on over 100+ industry leading applications and can be found supporting all levels of government and public sectors: • AVIATION • GOVERNMENT AGENCIES • MEDICAL • DEFENSE/MILITARY • INTELLIGENCE • UNIVERSITY RESEARCH 4. PNY OFFERS QUADRO RTX 8000 FOR SUPERCOMPUTING, CAE AND DEEP LEARNING (AI) NVIDIA QUADRO RTX 8000 The RTX 8000, powered by NVIDIA’s Turing GPU architecture, with RT Cores and Tensor Cores, delivers cinematic quality physically-based rendering, with AI denoising enhancements. New solutions ranging from generative design to Data Science are opened up by the RTX 8000’s amazing new capabilities. With unmatched mixed precision and Tensor compute, real-time ray tracing, and advanced AI on a single board, the RTX 8000 is the perfect upgrade to existing Quadro P6000 and GV100 use cases for demanding creative and design professionals.
    [Show full text]
  • Datasheet Quadro K600
    ACCELERATE YOUR CREATIVITY NVIDIA® QUADRO® K620 Accelerate your creativity with FEATURES ® ® > DisplayPort 1.2 Connector NVIDIA Quadro —the world’s most > DisplayPort with Audio > DVI-I Dual-Link Connector 1 powerful workstation graphics. > VGA Support ™ The NVIDIA Quadro K620 offers impressive > NVIDIA nView Desktop Management Software power-efficient 3D application performance and Compatibility capability. 2 GB of DDR3 GPU memory with fast > HDCP Support bandwidth enables you to create large, complex 3D > NVIDIA Mosaic2 SPECIFICATIONS models, and a flexible single-slot and low-profile GPU Memory 2 GB DDR3 form factor makes it compatible with even the most Memory Interface 128-bit space and power-constrained chassis. Plus, an all-new display engine drives up to four displays with Memory Bandwidth 29.0 GB/s DisplayPort 1.2 support for ultra-high resolutions like NVIDIA CUDA® Cores 384 3840x2160 @ 60 Hz with 30-bit color. System Interface PCI Express 2.0 x16 Quadro cards are certified with a broad range of Max Power Consumption 45 W sophisticated professional applications, tested by Thermal Solution Ultra-Quiet Active leading workstation manufacturers, and backed by Fansink a global team of support specialists, giving you the Form Factor 2.713” H × 6.3” L, Single Slot, Low Profile peace of mind to focus on doing your best work. Whether you’re developing revolutionary products or Display Connectors DVI-I DL + DP 1.2 telling spectacularly vivid visual stories, Quadro gives Max Simultaneous Displays 2 direct, 4 DP 1.2 you the performance to do it brilliantly. Multi-Stream Max DP 1.2 Resolution 3840 x 2160 at 60 Hz Max DVI-I DL Resolution 2560 × 1600 at 60 Hz Max DVI-I SL Resolution 1920 × 1200 at 60 Hz Max VGA Resolution 2048 × 1536 at 85 Hz Graphics APIs Shader Model 5.0, OpenGL 4.53, DirectX 11.24, Vulkan 1.03 Compute APIs CUDA, DirectCompute, OpenCL™ 1 Via supplied adapter/connector/bracket | 2 Windows 7, 8, 8.1 and Linux | 3 Product is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing Process when available.
    [Show full text]
  • NVIDIA Quadro P620
    UNMATCHED POWER. UNMATCHED CREATIVE FREEDOM. NVIDIA® QUADRO® P620 Powerful Professional Graphics with FEATURES > Four Mini DisplayPort 1.4 Expansive 4K Visual Workspace Connectors1 > DisplayPort with Audio The NVIDIA Quadro P620 combines a 512 CUDA core > NVIDIA nView® Desktop Pascal GPU, large on-board memory and advanced Management Software display technologies to deliver amazing performance > HDCP 2.2 Support for a range of professional workflows. 2 GB of ultra- > NVIDIA Mosaic2 fast GPU memory enables the creation of complex 2D > Dedicated hardware video encode and decode engines3 and 3D models and a flexible single-slot, low-profile SPECIFICATIONS form factor makes it compatible with even the most GPU Memory 2 GB GDDR5 space and power-constrained chassis. Support for Memory Interface 128-bit up to four 4K displays (4096x2160 @ 60 Hz) with HDR Memory Bandwidth Up to 80 GB/s color gives you an expansive visual workspace to view NVIDIA CUDA® Cores 512 your creations in stunning detail. System Interface PCI Express 3.0 x16 Quadro cards are certified with a broad range of Max Power Consumption 40 W sophisticated professional applications, tested by Thermal Solution Active leading workstation manufacturers, and backed by Form Factor 2.713” H x 5.7” L, a global team of support specialists, giving you the Single Slot, Low Profile peace of mind to focus on doing your best work. Display Connectors 4x Mini DisplayPort 1.4 Whether you’re developing revolutionary products or Max Simultaneous 4 direct, 4x DisplayPort telling spectacularly vivid visual stories, Quadro gives Displays 1.4 Multi-Stream you the performance to do it brilliantly.
    [Show full text]