Latest Trends in Computing and Communication
Gil Bloch
© 2019 Mellanox Technologies | Confidential 1 Moore’s Law
© 2019 Mellanox Technologies | Confidential 2 Moore’s Law What is it?
© 2019 Mellanox Technologies | Confidential 3 Moore’s Law Where is it going?
▪ April 2005, Gordon Moore stated in an interview that the projection cannot be sustained indefinitely: "It can't continue forever. ... It no longer centered its research and development plan on Moore's law.
© 2019 Mellanox Technologies | Confidential 4 Moore’s Law
GPU Accelerated Computing Google TPU Cloud
© 2019 Mellanox Technologies | Confidential 5 Exponential Data Growth
© 2019 Mellanox Technologies | Confidential 6 Exponential Data Growth Everywhere
Cloud HPC Big Data
Security Internet Enterprise of Things
Storage Machine Business Learning Intelligence
© 2019 Mellanox Technologies | Confidential 7 Riding The Data Wave It is not a wave, it is a Tsunami
Did you know that 90 % of the world’s data has been created only in last two years?
© 2019 Mellanox Technologies | Confidential 8 From Oil and Banking to Data Top 10 Companies in the World (Market Cap) ▪ 1998 ▪ 2009 ▪ 2019 ▪ Microsoft ▪ Exxon Mobil ▪ Apple ▪ General Electric ▪ Petrochina ▪ Amazon ▪ Exxon Mobil ▪ Walmart ▪ Google (Alphabet) ▪ Royal Dutch Shell ▪ ICBC ▪ Microsoft ▪ Merck ▪ China Mobile ▪ Facebook ▪ Pfitzer ▪ Microsoft ▪ Tencent ▪ Intel ▪ AT&T ▪ Alibaba ▪ Coca Cola ▪ Johnson & Johnson ▪ Berkshire Hathaway ▪ Walmart ▪ Royal Dutch Shell ▪ JPMorgan Chase ▪ IBM ▪ Procter & Gamble ▪ Exxon Mobil
▪ Oil and Gas ▪ Pharmaceutical / Medical device company ▪ Data Driven Revenues
© 2019 Mellanox Technologies | Confidential 9 Cloud Computing
© 2019 Mellanox Technologies | Confidential 10 The Hyper-Scalers (Whales)
▪ How many servers does Google have? ▪ We do not know, they never expose the numbers ▪ There are guestimates… ▪ In 2011 - 900,000 servers ▪ In 2018 - 2,500,000 servers (Source – Gartner) ▪ “As of 2018, Google has invested over $10.5 billion equipping its US
data centers to deliver state-of-the-art services.” (Source - Google) Mayes County, Oklahoma data center (source – google)
Google data centers in the Dalles, Oregon, 200 (Photo by Craig Mitchelldyer/Getty Images)
© 2019 Mellanox Technologies | Confidential 11 Artificial Intelligence
© 2019 Mellanox Technologies | Confidential 12 Neural Networks Complexity Growth
350X Image ResNet AlexNet GoogleNet Recognition
Inception-V2 Inception-V4
2012 2013 2014 2015 2016
30X Speech DeepSpeech DeepSpeech-2 DeepSpeech-3 Recognition
2014 2015 2016 2017
Complexity = GOPS X Bandwidth
© 2019 Mellanox Technologies | Confidential 14 © 2019 Mellanox Technologies | Confidential 15 Enabling World-Leading Artificial Intelligence Solutions Mellanox Unleashes the Power of Artificial Intelligence
More Better Faster Data Models Interconnect GPUs CPUs ASIC FPGAs Storage
© 2019 Mellanox Technologies | Confidential 16 The Need for Intelligent and Faster Interconnect Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale
CPU-Centric (Onload) Data-Centric (Offload)
CPU GPU CPU GPU
GPU CPU GPU CPU Onload Network In-Network Computing CPU GPU CPU GPU
GPU CPU GPU CPU
Must Wait for the Data Analyze Data as it Moves! Creates Performance Bottlenecks Higher Performance and Scale
© 2019 Mellanox Technologies | Confidential 17 An Application Example – Pizza Processing CPU 1 – Pizza Generation CPU 2 – Pizza Consumption ▪ Order Pizza ▪ Call (or use Pizza application) ▪ CPU 1 – prepare Pizza CPU-Centric (Onload) ▪ Tomato sauce, Cheese, Peperoni… ▪ CPU 1 – Put in the oven CPU GPU ▪ And now we wait… ▪ CPU 1 – Pack and send GPU CPU ▪ Network (Pizza Delivery) Onload Network CPU GPU
GPU CPU
Must Wait for the Data Creates Performance Bottlenecks
© 2019 Mellanox Technologies | Confidential 18 What if…
© 2019 Mellanox Technologies | Confidential 19 OK, So What’s Should I Look For
© 2019 Mellanox Technologies | Confidential 20 The Need for Speed
© 2019 Mellanox Technologies | Confidential 21 Mellanox Accelerates TensorFlow 1.5
High BW is a Must For Faster Training Large Scale Models 6.5X with higher BW
6.5X
2.5X
© 2019 Mellanox Technologies | Confidential 22 PeerDirect GPUDirect
© 2019 Mellanox Technologies | Confidential 23 Just Before We Start This is what a (GPU) server looks like
© 2019 Mellanox Technologies | Confidential 24 10X Higher Performance with GPUDirect™ RDMA
GPUDirect™ RDMA ▪ Accelerates HPC and Deep Learning performance
▪ Lowest communication latency for GPUs
© 2019 Mellanox Technologies | Confidential 25 Remote Direct Memory Access RDMA
© 2019 Mellanox Technologies | Confidential 26 Remote Direct Memory Access
▪ Remote ▪ Data transfer between nodes connected by an interconnect
▪ Direct ▪ No operating system kernel involvement in transfer ▪ All transfer operation offloaded to the network card
▪ Memory ▪ Transfer between user-space application virtual memory ▪ No extra copying or buffering
▪ Access ▪ Send / Receive ▪ Read / Write ▪ Atomic
© 2019 Mellanox Technologies | Confidential 27 RDMA vs. TCP
RDMA access model TCP/IP socket access model
▪ Message - preserve user’s message ▪ Byte stream – application recover message boundaries
▪ Asynchronous – no blocking during transfer ▪ Synchronous – block until data is sent / ▪ Starts when work added to work queue received ▪ Finishes when status available in completion queue
▪ Support paired (two sided) and unpaired ▪ send() / recv() are paired (one sided) transfers ▪ Both sides must participate in the transfer
▪ No data copying into system buffers ▪ Requires data copy using system buffers ▪ Memory involved in transfer is untouchable ▪ User memory accessible immediately before and between start and completion of transfer after send() / recv() operations
© 2019 Mellanox Technologies | Confidential 28 Unbeatable Performance with RDMA
▪ Main features TCP/IP ▪ Remote memory read/write semantics in addition to send/receive ▪ Kernel bypass / direct user space access ▪ Full hardware offload for network stack ▪ Secure, channel based IO
▪ Application Advantage ▪ Lowest latency ▪ Highest bandwidth ▪ Lowest CPU consumption ▪ Direct memory access, no unnecessary data copies RDMA ▪ RoCE: RDMA over Converged Ethernet ▪ Available for all Ethernet speeds 10 – 400G
© 2019 Mellanox Technologies | Confidential 29 RDMA Accelerates TensorFlow
Unmatched Linear Better Scalability at No 50% Performance Additional Cost
© 2019 Mellanox Technologies | Confidential 30 Thank You
© 2019 Mellanox Technologies | Confidential 31