NVIDIA® TESLA® P100 GPU ACCELERATOR

Infinite compute power for the modern data center Artificial intelligence for self-driving cars. Predicting our climate’s future. A new drug to treat cancer. The world’s most important challenges require tremendous amounts of computing to become reality. But today’s data centers rely on many interconnected commodity compute nodes, limiting the performance needed to drive important HPC and hyperscale workloads.

The Tesla P100 is the most advanced data center accelerator ever built, SPECIFICATIONS ™ leveraging the groundbreaking NVIDIA Pascal GPU architecture to deliver the GPU Architecture NVIDIA Pascal world’s fastest compute node. It’s powered by four innovative technologies with NVIDIA CUDA® Cores 3584 huge jumps in performance for HPC and deep learning workloads. Double-Precision 5.3 TeraFLOPS Performance The Tesla P100 also features NVIDIA NVLink™ technology that enables superior Single-Precision 10.6 TeraFLOPS strong-scaling performance for HPC and hyperscale applications. Up to eight Tesla Performance P100 GPUs interconnected in a single node can deliver the performance of racks of Half-Precision 21.2 TeraFLOPS commodity CPU servers. Performance GPU Memory 16 GB CoWoS HBM2 TESLA P100 AND NVLINK DELIVERS UP TO 50X PERFORMANCE BOOST FOR Memory Bandwidth 732 GB/s DATA CENTER APPLICATIONS Interconnect NVIDIA NVLink Max Power Consumption 300 W NVIDIA Tesla P100 Performance ECC Native support with no capacity or performance 50 X overhead

r 2x K80 (M40 for Alexnet) 2X P100 4X P100 8X P100 Thermal Solution Passive 40 X Form Factor SXM2 30 X Compute APIs NVIDIA CUDA, DirectCompute, 20 X OpenCL™, OpenACC TeraFLOPS measurements with NVIDIA GPU Boost™ technology 10 X Dual Socket Haswell PU Socket Dual

Appl cat on Speed-up ove 0 X Alexnet VASP HOOMD- OSMO MIL Amber HA w th affe Blue

PU 16 cores, E5-2698v3 2 30Hz 256B System Memory Tesla K80 PUs 2x Dual PU K80s, Pre-Productƒon Tesla P100

Tesla P100 | Data Sheet | Oct16 EXPERIENCE A GIANT LEAP IN EVERYTHING. The Tesla P100 is reimagined from silicon to software, crafted with innovation at every level. Each groundbreaking technology delivers a dramatic jump in performance to inspire the creation of the world’s fastest compute node.

PASCAL ARCHITECTURE COWOS HBM2 More than 21 TeraFLOPS of FP16, 10 Compute and data are integrated on the TeraFLOPS of FP32, and 5 TeraFLOPS same package using Chip-on-Wafer-on- of FP64 performance powers new Substrate with HBM2 technology for 3X possibilities in deep learning and HPC memory performance over the previous- workloads. generation architecture.

Exponential HPC and hyperscale performance 3X memory boost

P100 25 P100 (FP16) 800

20 600

15 P100 (FP32) M40 400 K40 M40 10 K40

200 5

Teraflops (FP32/FP16) Teraflops 0 0 B-drectonal BW (B/Sec)

NVLINK INTERCONNECT CPU GPU PAGE MIGRATION ENGINE This high-speed bidirectional Simpler programming and computing interconnect scales applications across performance tuning means that multiple GPUs for 5X higher performance Unified Memory applications can now scale beyond the than current best-in-class technology. GPU’s physical memory size to virtually limitless levels.

5X improvement in interconnect performance Virtually limitless memory scalability

P100 P100

160 10,000

120 1,000

M40 80 100 K40 K40 M40 40 10

0 0 Addressable Memory (B) B-drectonal BW (B/Sec)

To learn more about the NVIDIA Tesla P100 visit www.nvidia.com/tesla

© 2016 NVIDIA Corporation. All rights reserved. NVIDIA, the NVIDIA logo, Tesla, NVIDIA GPU Boost, CUDA, and NVIDIA Pascal are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. All other trademarks and copyrights are the property of their respective owners. Oct16