The Ultimate CUDA Development GPU 1 Introducing Geforce GTX TITAN the Ultimate CUDA Development GPU
Total Page:16
File Type:pdf, Size:1020Kb
The Ultimate CUDA Development GPU 1 Introducing GeForce GTX TITAN The Ultimate CUDA Development GPU 2688 4.5 1.27 288 CUDA Cores Teraflops Single Precision Teraflops Double Precision GB/s Memory Bandwidth 2 GTX TITAN Personal Supercomputer on Your Desktop 1 Teraflop < $1000 Develop Anywhere Deploy on Cluster Ease of Programming with Develop on GTX Titan New Kepler Architecture 3 The Best of Kepler in a PC GFLOPS Peak Double Precision Dynamic Parallelism 1500 1250 1000 750 500 250 0 Core i7-3970X GTX 680 GTX TITAN Boosts PC with 8x More Performance More Science, Less Coding Dynamic Parallelism Makes Parallel Programming Easier Quicksort No complex CPU & GPU interaction Before Kepler Easier code in half the lines Easier porting for existing codes With Kepler 2x More Applications, More Customers Irregular Work Adaptive Mesh (CFD) N-Body Tree Codes (Astrophysics) Sparse Matrix Video Transcoding Algebraic Multigrid Kepler Divide and Conquer Fermi (Big Data) Direct N-body Burrows-Wheeler Aligner Structured Monte Carlo (Bioinformatics) Work Reverse Time Migration Data Task Parallel Parallel Comparing GTX TITAN and Tesla K20X Features GeForce GTX TITAN Tesla K20X 837MHz/3GHz Core/Mem clock 732MHz/2.6GHz (clocks may vary when double precision is on) Peak Single Precision ~4.5 Tflops 3.95 TFlops Peak Double Precision ~1.27 Tflops (estimate) 1.32 TFlops Memory size 6 GB 6 GB Memory BW (ECC off) 288 GB/s 250 GB/s Gen 3 only on Ivy Bridge PCIe Gen 2 Gen 2 on Sandy Bridge Dynamic Parallelism, Hyper-Q For CUDA Streams Dynamic Parallelism, Hyper-Q Proxy for MPI and CUDA Features GPUDirect Peer to Peer CUDA Streams, GPUDirect Peer to Peer, and RDMA GPU monitoring None NVML/NVSMI, OOB, InfoROM, NVHealthmon, TCC Cluster monitoring None Bright Computing, Ganglia ECC Features No ECC DRAM, Internal Caches & Reg Files Total Board Power 250W 235W Tesla Advantage: Built for Deployment Built for HPC Reliability • Integrated solutions & support from OEMs and channel Performance • ECC protection • Tools for GPU Management • Tested to run 24/7 with real- (Nvhealthmon, nvsmi/nvml) • Fastest DP of 1.31TFLOPS on world workloads Tesla K20X • Tools for Cluster Management • 3 year warranty and support for • Optimized for Infiniband with bugs/feature requests • Enterprise OS support NVIDIA GPUDirect™ • ISVs certify only on Tesla • Solution expertise provided by • Hyper-Q for accelerating MPI CUDA engineers and technical based workloads • NVIDIA technical support staff • Tuning and optimization • Longer life cycle for continuity • Designed, tested and optimized support from NVIDIA experts and cluster expansion for cluster deployment Develop with GeForce, Deploy with Tesla Call to Action • Promote GTX Titan with OpenACC to new developers • Increases # of GPU enabled workloads • Continue recommending Tesla for workloads in production environment • Tesla Advantage: Performance + Reliability + Built for HPC • Refer to GTX Titan and Tesla FAQ for differentiation • GPUDirect, Hyper-Q, GPUBoost. GPUMonitoring Thank You .