Languages, APIs and Development Tools for GPU Computing San Jose, California | 30 September 2009

Will Ramey Product Manager, GPU Computing

NVIDIA Confidential GPU Computing Overview

Broad Adoption

Over 150,000,000 installed CUDA- Architecture GPUs GPU Computing Applications

Over 90,000 GPU Computing Developers (9/09)

Windows, Linux and Direct CUDA OpenCL Fortran Python, MacOS Platforms Compute supported C/C++ Java, .NET, … 1st GPU demo Microsoft API for PGI Accelerator PyCUDA GPU Computing spans Over 90,000 developers Shipped 1st OpenCL GPU Computing PGI CUDA Fortran jCUDA HPC to Consumer Running in Production since 2008 Conformant Driver Supports all CUDA- NOAA Fortran bindings CUDA.NET Architecture GPUs Public Availability OpenCL.NET SDK + Libs + Visual (DX10 and DX11) FLAGON 250+ Universities Profiler and Debugger teaching GPU Computing on the CUDA Architecture NVIDIA GPU with the CUDA Parallel Computing Architecture

OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. © 2009 NVIDIA Corporation GPU Computing Application Development

Your GPU Computing Application

Application Acceleration Engines (AXEs) Middleware, Modules & Plug-ins

Foundation Libraries Low-level Functional Libraries

Development Environment Languages, Device APIs, Compilers, Debuggers, Profilers, etc.

CUDA Architecture

© 2009 NVIDIA Corporation Languages & APIs

© 2009 NVIDIA Corporation CUDA C/C++ Update

2007 2008 2009 July 07 Nov 07 April 08 Aug 08 July 09 Nov 09

CUDA “Nexus” CUDA Toolkit 1.0 CUDA Toolkit 1.1 CUDA Toolkit 2.0 CUDA Toolkit 2.3 Visual Profiler 2.2 Beta

• C Compiler • Win XP 64 -gdb • DP FFT CUDA Toolkit 3.0 • C Extensions • Double Precision HW Debugger Beta • Atomics support • 16-32 Conversion • Single Precision • Compiler Optimizations intrinsics • BLAS • Multi-GPU • FFT support • C++ Functionality • Vista 32/64 • Performance • SDK enhancements 40 examples • Fermi Arch • Mac OSX support

• 3D Textures • Tools

• HW Interpolation • Driver and RT

© 2009 NVIDIA Corporation CUDA C/C++ Update

2007 Tools 2008 2009 CUDA C++ July 07 Nov 07 April 08 Aug 08 July 09 Nov 09 Language Functionality • “Nexus” Visual Studio IDE • Debugger support for CUDA Driver API • C++ Class Inheritance • ELF support (cubin format deprecated) • C++ Template Inheritance • CUDA Memory Checker (cuda-gdb feature) • Fermi:CUDA Fermi HW Debugger support “Nexus” CUDA Toolkit(Productivity) 1.0 CUDA Toolkit 1.1 CUDA Toolkit 2.0 CUDA Toolkit 2.3 Visualin Profiler cuda-gdb 2.2 Beta • Fermi: Fermi HW Profiler support in CUDA Visual Profiler • C Compiler • Win XP 64 Cuda-gdb • DP FFT CUDA Toolkit 3.0 • C Extensions • Double Precision HW Debugger Beta • Atomics support • 16-32 Conversion • Single PrecisionFermi Architecture Support CUDA Driver & CUDA• Compiler C Runtime Optimizations intrinsics • BLAS • Multi-GPU • FFT • Native 64-bit GPUsupport support • CUDA Driver / Runtime Buffer • C++ Functionality • Vista 32/64 • Performance • SDK • Generic Address space interoperability (Stateless CUDART) enhancements 40 examples • Fermi Arch • Dual DMA support • Separate emulation •runtimeMac OSX support • ECC reporting • Unified interoperability API for • 3D Textures • Concurrent Kernel Execution Direct3D and OpenGL • Tools • OpenGL texture interoperability• HW Interpolation • Driver and RT Architecture supported in SW now • Fermi: Direct3D 11 interoperability

© 2009 NVIDIA Corporation Fortran Language Solutions

PGI Accelerators High-level implicit programming model, similar to OpenMP Auto-parallelizing compiler

PGI CUDA Fortran Compiler High-level explicit programming model, similar to CUDA C Runtime

NOAA F2C-ACC Converts Fortran codes to CUDA C Some hand-optimization expected

FLAGON Fortran 95 Library for GPU Numerics Includes support for cuBLAS, cuFFT, CUDPP, etc.

© 2009 NVIDIA Corporation OpenCL

Cross-vendor open standard Managed by the Khronos Group

Low-level API for device management http://www.khronos.org/opencl and launching kernels Close-to-the-metal programming interface JIT compilation of kernel programs

C-based language for compute kernels Kernels must be optimized for each processor architecture

NVIDIA released the first OpenCL v1.0 conformant driver for Windows and Linux to thousands of developers in June 2009 © 2009 NVIDIA Corporation NVIDIA OpenCL Support

OpenCL ICD NVIDIA R195 OpenGL Interoperability OpenCL Documentation

Double Precision

NVIDIA Compiler Flags NVIDIA Query for Compute Capability OpenCL Visual Profiler Byte Addressable Stores NVIDIA R190 32-bit Atomics OpenCL Code Samples Images

OpenCL 1.0 Driver Min. Spec. NV R190 released June 2009

© 2009 NVIDIA Corporation DirectCompute

Microsoft standard for all GPU vendors Released with DirectX® 11 / Windows 7 Runs on all 150M+ CUDA-enabled DirectX 10 class GPUs and later

Low-level API for device management and launching kernels Good integration with other DirectX APIs

Defines HLSL-based language for compute shaders Kernels must be optimized for each processor architecture

© 2009 NVIDIA Corporation Language & APIs for GPU Computing Solution Approach Language Integration CUDA C / C++ Device-Level API PGI Accelerator Auto Parallelizing Compiler PGI CUDA Fortran Language Integration CAPS HMPP Auto Parallelizing Compiler

OpenCL Device-Level API

DirectCompute Device-Level API

PyCUDA API Bindings

CUDA.NET, API Bindings OpenCL.NET, jCUDA

… …

© 2009 NVIDIA Corporation Massively Parallel Development Language / API Support (Linux)

CUDA C/C++ OpenCL Fortran

CAPS HMPP   PGI Accelerators   PGI CUDA Fortran   NV cuda-gdb  Allinea DDT  TotalView Debugger  NV Visual Profiler   TAU CUDA   Mcore Platform Analyzer  

cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation NVIDIA SDKs

Hundreds of code samples for CUDA C/C++, DirectCompute, and OpenCL

Finance Oil & Gas Video/Image Processing 3D Volume Rendering Particle Simulations Fluid Simulations Math Functions

© 2009 NVIDIA Corporation Development Tools

© 2009 NVIDIA Corporation Massively Parallel Development Tools for Linux

Debugging Profiling Analysis Cluster Lib’s Support

CUDA C/C++     

Coming Coming Coming OpenCL Soon  Soon  Soon

Fortran    

cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation cuda-gdb

CUDA debugging integrated into GDB on Linux

Supported on 32bit and 64bit systems Seamlessly debug both the host/CPU and device/GPU code Set breakpoints on any source line or symbol name Access and print all CUDA memory allocs, local, global, constant and shared vars Kernel Source Debugging

Included in the CUDA Toolkit

© 2009 NVIDIA Corporation CUDA Visual Profiler

Analyze GPU HW performance signals, kernel occupancy, instruction throughput, and more Highly configurable tables and graphical views Save/load profiler sessions or export to CSV for later analysis Compare results visually across multiple sessions to see improvements Windows, Linux and Mac OS X supported OpenCL Visual Profiler for Windows and Linux

Included in the CUDA Toolkit

© 2009 NVIDIA Corporation “Nexus” 1.0 Beta

Parallel Debugger System Analyzer Graphics Inspector GPU source code debugging Platform-level Analysis Visualize and debug Variable & memory For the CPU and GPU graphics content inspection Visualize Compute Kernels, Driver API Calls, and Memory Transfers © 2009 NVIDIA Corporation Massively Parallel Development Tools for Windows

Visual Parallel Parallel System Premium Early Multi- Studio Debugging Profiling Analysis/ Support Access Vendor Integration Trace GPU (CPU/GPU) Support Compute Gfx Compute Gfx Compute Gfx “Nexus”* Pro           “Nexus”*      NV Visual Profiler  Mcore Platform Analyzer   PerfKit/PerfHUD    gDEBugger     

* “Nexus” is NVIDIA’s code name

© 2009 NVIDIA Corporation Massively Parallel Development Tools for Linux

Compilation Debugging Profiling Analysis Premium Support CAPS HMPP   PGI Accelerators   PGI CUDA Fortran   NV cuda-gdb  Allinea DDT   TotalView Debugger   NV Visual Profiler  TAU CUDA   Mcore Platform Analyzer   

cuda-gdb will be extended to support OpenCL debugging in a future release, with solutions from Allinea, TotalView and others expected to follow. © 2009 NVIDIA Corporation Thank You

NVIDIA Confidential PyCUDA

Slide courtesy of Andreas Klöckner, Brown University

© 2009 NVIDIA Corporation CAPS HMPP

http://www.caps-entreprise.com/hmpp.html © 2009 NVIDIA Corporation jCUDA

http://www.hoopoe-cloud.com/Solutions/ jCUDA/Default.aspx

Courtesy of Company for Advanced Supercomputing Solutions, Ltd.

© 2009 NVIDIA Corporation CUDA.NET

http://www.hoopoe-cloud.com/Solutions/ CUDA.NET/Default.aspx

Courtesy of Company for Advanced Supercomputing Solutions, Ltd.

© 2009 NVIDIA Corporation OPENCL.NET

http://www.hoopoe-cloud.com/Solutions/ CUDA.NET/Default.aspx

Courtesy of Company for Advanced Supercomputing Solutions, Ltd.

© 2009 NVIDIA Corporation