® TESLA® P100 GPU ACCELERATOR

World’s most advanced data center accelerator for PCIe-based servers HPC data centers need to support the ever-growing demands of scientists and researchers while staying within a tight budget. The old approach of deploying lots of commodity compute nodes requires huge interconnect overhead that substantially increases costs SPECIFICATIONS without proportionally increasing performance. GPU Architecture NVIDIA Pascal NVIDIA CUDA® Cores 3584 P100 GPU accelerators are the most advanced ever Double-Precision 4.7 TeraFLOPS built, powered by the breakthrough NVIDIA Pascal™ architecture Performance Single-Precision 9.3 TeraFLOPS and designed to throughput and save money for HPC and Performance hyperscale data centers. The newest addition to this family, Tesla Half-Precision 18.7 TeraFLOPS Performance P100 for PCIe enables a single node to replace half a rack of GPU Memory 16GB CoWoS HBM2 at commodity CPU nodes by delivering lightning-fast performance in a 732 GB/s or 12GB CoWoS HBM2 at broad range of HPC applications. 549 GB/s System Interface PCIe Gen3 MASSIVE LEAP IN PERFORMANCE Max Power Consumption 250 W ECC Yes NVIDIA Tesla P100 for PCIe Performance Thermal Solution Passive Form Factor PCIe Full Height/Length 30 X 2X K80 2X P100 (PIe) 4X P100 (PIe) Compute CUDA, DirectCompute, 25 X OpenCL™, OpenACC

™ 20 X TeraFLOPS measurements with NVIDIA GPU Boost technology

15 X

10 X

5 X

Appl cat on Speed-up 0 X NAMD VASP MIL HOOMD- AMBER affe/ Blue AlexNet

Dual PU server, Intel E5-2698 v3  23 Hz, 256 B System Memory, Pre-Product on Tesla P100 Tesla P100 PCle | Data Sheet | Oct16 A GIANT LEAP IN PERFORMANCE Tesla P100 for PCIe is reimagined from silicon to software, crafted with innovation at every level. Each groundbreaking technology delivers a dramatic jump in performance to substantially boost the data center throughput.

PASCAL ARCHITECTURE COWOS HBM2 CPU GPU PAGE MIGRATION ENGINE More than 18.7 TeraFLOPS Compute and data are Simpler programming and of FP16, 4.7 TeraFLOPS integrated on the same computing performance of double-precision, and package using Chip-on- Unified Memory tuning means that 9.3 TeraFLOPS of single- Wafer-on-Substrate with applications can now scale precision performance HBM2 technology for 3X beyond the GPU’s physical powers new possibilities memory performance over memory size to virtually in deep learning and the previous-generation limitless levels. HPC workloads. architecture.

Exponential HPC and hyperscale performance 3X memory boost Virtually limitless memory

P100 P100 25 P100 (FP16) 800 10,000

20 600 1,000

15 P100 (FP32) M40 M40 400 K40 M40 100 K40 10 K40

200 10 5

Teraflops (FP32/FP16) Teraflops 0 0 0 Addressable Memory (B) B-drectonal BW (B/Sec)

To learn more about the Tesla P100 for PCIe visit www.nvidia.com/tesla

© 2016 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Tesla, NVIDIA GPU Boost, CUDA, and NVIDIA Pascal are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. OpenCL is a trademark of Apple Inc. used under license to the Inc. All other trademarks and copyrights are the property of their respective owners. OCT16