Languages, Apis and Development Tools for GPU Computing San Jose, California | 30 September 2009

Languages, APIs and Development Tools for GPU Computing San Jose, California | 30 September 2009 Will Ramey Product Manager, GPU Computing NVIDIA Confidential GPU Computing Overview Broad Adoption Over 150,000,000 installed CUDA- Architecture GPUs GPU Computing Applications Over 90,000 GPU Computing Developers (9/09) Windows, Linux and Direct CUDA OpenCL Fortran Python, MacOS Platforms Compute supported C/C++ Java, .NET, … 1st GPU demo Microsoft API for PGI Accelerator PyCUDA GPU Computing spans Over 90,000 developers Shipped 1st OpenCL GPU Computing PGI CUDA Fortran jCUDA HPC to Consumer Running in Production since 2008 Conformant Driver Supports all CUDA- NOAA Fortran bindings CUDA.NET Architecture GPUs Public Availability OpenCL.NET SDK + Libs + Visual (DX10 and DX11) FLAGON 250+ Universities Profiler and Debugger teaching GPU Computing on the CUDA Architecture NVIDIA GPU with the CUDA Parallel Computing Architecture OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. © 2009 NVIDIA Corporation GPU Computing Application Development Your GPU Computing Application Application Acceleration Engines (AXEs) Middleware, Modules & Plug-ins Foundation Libraries Low-level Functional Libraries Development Environment Languages, Device APIs, Compilers, Debuggers, Profilers, etc. CUDA Architecture © 2009 NVIDIA Corporation Languages & APIs © 2009 NVIDIA Corporation CUDA C/C++ Update 2007 2008 2009 July 07 Nov 07 April 08 Aug 08 July 09 Nov 09 CUDA “Nexus” CUDA Toolkit 1.0 CUDA Toolkit 1.1 CUDA Toolkit 2.0 CUDA Toolkit 2.3 Visual Profiler 2.2 Beta • C Compiler • Win XP 64 cuda-gdb • DP FFT CUDA Toolkit 3.0 • C Extensions • Double Precision HW Debugger Beta • Atomics support • 16-32 Conversion • Single Precision • Compiler Optimizations intrinsics • BLAS • Multi-GPU • FFT support • C++ Functionality • Vista 32/64 • Performance • SDK enhancements 40 examples • Fermi Arch • Mac OSX support • 3D Textures • Tools • HW Interpolation • Driver and RT © 2009 NVIDIA Corporation CUDA C/C++ Update 2007 Tools 2008 2009 CUDA C++ July 07 Nov 07 April 08 Aug 08 July 09 Nov 09 Language Functionality • “Nexus” Visual Studio IDE • Debugger support for CUDA Driver API • C++ Class Inheritance • ELF support (cubin format deprecated) • C++ Template Inheritance • CUDA Memory Checker (cuda-gdb feature) • Fermi:CUDA Fermi HW Debugger support “Nexus” CUDA Toolkit(Productivity) 1.0 CUDA Toolkit 1.1 CUDA Toolkit 2.0 CUDA Toolkit 2.3 Visualin Profiler cuda-gdb 2.2 Beta • Fermi: Fermi HW Profiler support in CUDA Visual Profiler • C Compiler • Win XP 64 Cuda-gdb • DP FFT CUDA Toolkit 3.0 • C Extensions • Double Precision HW Debugger Beta • Atomics support • 16-32 Conversion • Single PrecisionFermi Architecture Support CUDA Driver & CUDA• Compiler C Runtime Optimizations intrinsics • BLAS • Multi-GPU • FFT • Native 64-bit GPUsupport support • CUDA Driver / Runtime Buffer • C++ Functionality • Vista 32/64 • Performance • SDK • Generic Address space interoperability (Stateless CUDART) enhancements 40 examples • Fermi Arch • Dual DMA support • Separate emulation •runtimeMac OSX support • ECC reporting • Unified interoperability API for • 3D Textures • Concurrent Kernel Execution Direct3D and OpenGL • Tools • OpenGL texture interoperability• HW Interpolation • Driver and RT Architecture supported in SW now • Fermi: Direct3D 11 interoperability © 2009 NVIDIA Corporation Fortran Language Solutions PGI Accelerators High-level implicit programming model, similar to OpenMP Auto-parallelizing compiler PGI CUDA Fortran Compiler High-level explicit programming model, similar to CUDA C Runtime NOAA F2C-ACC Converts Fortran codes to CUDA C Some hand-optimization expected FLAGON Fortran 95 Library for GPU Numerics Includes support for cuBLAS, cuFFT, CUDPP, etc. © 2009 NVIDIA Corporation OpenCL Cross-vendor open standard Managed by the Khronos Group Low-level API for device management http://www.khronos.org/opencl and launching kernels Close-to-the-metal programming interface JIT compilation of kernel programs C-based language for compute kernels Kernels must be optimized for each processor architecture NVIDIA released the first OpenCL v1.0 conformant driver for Windows and Linux to thousands of developers in June 2009 © 2009 NVIDIA Corporation NVIDIA OpenCL Support OpenCL ICD NVIDIA R195 OpenGL Interoperability OpenCL Documentation Double Precision NVIDIA Compiler Flags NVIDIA Query for Compute Capability OpenCL Visual Profiler Byte Addressable Stores NVIDIA R190 32-bit Atomics OpenCL Code Samples Images OpenCL 1.0 Driver Min. Spec. NV R190 released June 2009 © 2009 NVIDIA Corporation DirectCompute Microsoft standard for all GPU vendors Released with DirectX® 11 / Windows 7 Runs on all 150M+ CUDA-enabled DirectX 10 class GPUs and later Low-level API for device management and launching kernels Good integration with other DirectX APIs Defines HLSL-based language for compute shaders Kernels must be optimized for each processor architecture © 2009 NVIDIA Corporation Language & APIs for GPU Computing Solution Approach Language Integration CUDA C / C++ Device-Level API PGI Accelerator Auto Parallelizing Compiler PGI CUDA Fortran Language Integration CAPS HMPP Auto Parallelizing Compiler OpenCL Device-Level API DirectCompute Device-Level API PyCUDA API Bindings CUDA.NET, API Bindings OpenCL.NET, jCUDA … … © 2009 NVIDIA Corporation Massively Parallel Development Language / API Support (Linux) CUDA C/C++ OpenCL Fortran CAPS HMPP PGI Accelerators PGI CUDA Fortran NV cuda-gdb Allinea DDT TotalView Debugger NV Visual Profiler TAU CUDA Mcore Platform Analyzer cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation NVIDIA SDKs Hundreds of code samples for CUDA C/C++, DirectCompute, and OpenCL Finance Oil & Gas Video/Image Processing 3D Volume Rendering Particle Simulations Fluid Simulations Math Functions © 2009 NVIDIA Corporation Development Tools © 2009 NVIDIA Corporation Massively Parallel Development Tools for Linux Debugging Profiling Analysis Cluster Lib’s Support CUDA C/C++ Coming Coming Coming OpenCL Soon Soon Soon Fortran cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation cuda-gdb CUDA debugging integrated into GDB on Linux Supported on 32bit and 64bit systems Seamlessly debug both the host/CPU and device/GPU code Set breakpoints on any source line or symbol name Access and print all CUDA memory allocs, local, global, constant and shared vars Kernel Source Debugging Included in the CUDA Toolkit © 2009 NVIDIA Corporation CUDA Visual Profiler Analyze GPU HW performance signals, kernel occupancy, instruction throughput, and more Highly configurable tables and graphical views Save/load profiler sessions or export to CSV for later analysis Compare results visually across multiple sessions to see improvements Windows, Linux and Mac OS X supported OpenCL Visual Profiler for Windows and Linux Included in the CUDA Toolkit © 2009 NVIDIA Corporation “Nexus” 1.0 Beta Parallel Debugger System Analyzer Graphics Inspector GPU source code debugging Platform-level Analysis Visualize and debug Variable & memory For the CPU and GPU graphics content inspection Visualize Compute Kernels, Driver API Calls, and Memory Transfers © 2009 NVIDIA Corporation Massively Parallel Development Tools for Windows Visual Parallel Parallel System Premium Early Multi- Studio Debugging Profiling Analysis/ Support Access Vendor Integration Trace GPU (CPU/GPU) Support Compute Gfx Compute Gfx Compute Gfx “Nexus”* Pro “Nexus”* NV Visual Profiler Mcore Platform Analyzer PerfKit/PerfHUD gDEBugger * “Nexus” is NVIDIA’s code name © 2009 NVIDIA Corporation Massively Parallel Development Tools for Linux Compilation Debugging Profiling Analysis Premium Support CAPS HMPP PGI Accelerators PGI CUDA Fortran NV cuda-gdb Allinea DDT TotalView Debugger NV Visual Profiler TAU CUDA Mcore Platform Analyzer cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation Thank You NVIDIA Confidential PyCUDA Slide courtesy of Andreas Klöckner, Brown University © 2009 NVIDIA Corporation CAPS HMPP http://www.caps-entreprise.com/hmpp.html © 2009 NVIDIA Corporation jCUDA http://www.hoopoe-cloud.com/Solutions/ jCUDA/Default.aspx Courtesy of Company for Advanced Supercomputing Solutions, Ltd. © 2009 NVIDIA Corporation CUDA.NET http://www.hoopoe-cloud.com/Solutions/ CUDA.NET/Default.aspx Courtesy of Company for Advanced Supercomputing Solutions, Ltd. © 2009 NVIDIA Corporation OPENCL.NET http://www.hoopoe-cloud.com/Solutions/ CUDA.NET/Default.aspx Courtesy of Company for Advanced Supercomputing Solutions, Ltd. © 2009 NVIDIA Corporation.

Languages, Apis and Development Tools for GPU Computing San Jose, California | 30 September 2009

Comparison of Technologies for General-Purpose Computing on Graphics Processing Units

A Qualitative Comparison Study Between Common GPGPU Frameworks

Gpgpu Processing in Cuda Architecture

Pny-Nvidia-Quadro-P2200.Pdf

Gainward GT 440 1GB DVI HDMI

Matt Sandy Program Manager Microsoft Corporation

3D Graphics for Virtual Desktops Smackdown

Gainward Geforce 210 512MB DVI HDMI

Matrox Imaging Library (MIL) 9.0

NVIDIA Parallel Nsight™

Nvidia-Quadro-P620-V2.Pdf

A Fast Fluid Simulator Using Smoothed-Particle Hydrodynamics