The Ultimate CUDA Development GPU 1 Introducing GeForce GTX TITAN The Ultimate CUDA Development GPU
2688 4.5 1.27 288 CUDA Cores Teraflops Single Precision Teraflops Double Precision GB/s Memory Bandwidth 2 GTX TITAN Personal Supercomputer on Your Desktop
1 Teraflop < $1000
Develop Anywhere
Deploy on Cluster Ease of Programming with
Develop on GTX Titan New Kepler Architecture
3 The Best of Kepler in a PC
GFLOPS Peak Double Precision Dynamic Parallelism 1500
1250
1000
750
500
250
0 Core i7-3970X GTX 680 GTX TITAN
Boosts PC with 8x More Performance More Science, Less Coding Dynamic Parallelism Makes Parallel Programming Easier Quicksort No complex CPU & GPU interaction
Before Kepler Easier code in half the lines
Easier porting for existing codes
With Kepler 2x More Applications, More Customers
Irregular Work Adaptive Mesh (CFD)
N-Body Tree Codes (Astrophysics) Sparse Matrix Video Transcoding Algebraic Multigrid Kepler Divide and Conquer Fermi (Big Data) Direct N-body Burrows-Wheeler Aligner Structured Monte Carlo (Bioinformatics) Work Reverse Time Migration
Data Task Parallel Parallel Comparing GTX TITAN and Tesla K20X
Features GeForce GTX TITAN Tesla K20X 837MHz/3GHz Core/Mem clock 732MHz/2.6GHz (clocks may vary when double precision is on) Peak Single Precision ~4.5 Tflops 3.95 TFlops Peak Double Precision ~1.27 Tflops (estimate) 1.32 TFlops Memory size 6 GB 6 GB Memory BW (ECC off) 288 GB/s 250 GB/s Gen 3 only on Ivy Bridge PCIe Gen 2 Gen 2 on Sandy Bridge Dynamic Parallelism, Hyper-Q For CUDA Streams Dynamic Parallelism, Hyper-Q Proxy for MPI and CUDA Features GPUDirect Peer to Peer CUDA Streams, GPUDirect Peer to Peer, and RDMA GPU monitoring None NVML/NVSMI, OOB, InfoROM, NVHealthmon, TCC Cluster monitoring None Bright Computing, Ganglia ECC Features No ECC DRAM, Internal Caches & Reg Files Total Board Power 250W 235W Tesla Advantage: Built for Deployment
Built for HPC
Reliability • Integrated solutions & support from OEMs and channel Performance • ECC protection • Tools for GPU Management • Tested to run 24/7 with real- (Nvhealthmon, nvsmi/nvml) • Fastest DP of 1.31TFLOPS on world workloads Tesla K20X • Tools for Cluster Management
• 3 year warranty and support for • Optimized for Infiniband with bugs/feature requests • Enterprise OS support NVIDIA GPUDirect™
• ISVs certify only on Tesla • Solution expertise provided by • Hyper-Q for accelerating MPI CUDA engineers and technical based workloads • NVIDIA technical support staff
• Tuning and optimization • Longer life cycle for continuity • Designed, tested and optimized support from NVIDIA experts and cluster expansion for cluster deployment Develop with GeForce, Deploy with Tesla Call to Action
• Promote GTX Titan with OpenACC to new developers • Increases # of GPU enabled workloads
• Continue recommending Tesla for workloads in production environment • Tesla Advantage: Performance + Reliability + Built for HPC
• Refer to GTX Titan and Tesla FAQ for differentiation • GPUDirect, Hyper-Q, GPUBoost. GPUMonitoring
Thank You