Latest Trends in Computing and Communication

Gil Bloch

© 2019 Mellanox Technologies | Confidential 1 Moore’s Law

© 2019 Mellanox Technologies | Confidential 2 Moore’s Law What is it?

© 2019 Mellanox Technologies | Confidential 3 Moore’s Law Where is it going?

▪ April 2005, Gordon Moore stated in an interview that the projection cannot be sustained indefinitely: "It can't continue forever. ... It no longer centered its research and development plan on Moore's law.

© 2019 Mellanox Technologies | Confidential 4 Moore’s Law

GPU Accelerated Computing TPU Cloud

© 2019 Mellanox Technologies | Confidential 5 Exponential Data Growth

© 2019 Mellanox Technologies | Confidential 6 Exponential Data Growth Everywhere

Cloud HPC Big Data

Security Internet Enterprise of Things

Storage Machine Business Learning Intelligence

© 2019 Mellanox Technologies | Confidential 7 Riding The Data Wave It is not a wave, it is a Tsunami

Did you know that 90 % of the world’s data has been created only in last two years?

© 2019 Mellanox Technologies | Confidential 8 From Oil and Banking to Data Top 10 Companies in the World (Market Cap) ▪ 1998 ▪ 2009 ▪ 2019 ▪ ▪ Exxon Mobil ▪ Apple ▪ ▪ Petrochina ▪ Amazon ▪ Exxon Mobil ▪ Walmart ▪ Google (Alphabet) ▪ Royal Dutch Shell ▪ ICBC ▪ Microsoft ▪ Merck ▪ China Mobile ▪ Facebook ▪ Pfitzer ▪ Microsoft ▪ Tencent ▪ ▪ AT&T ▪ Alibaba ▪ Coca Cola ▪ Johnson & Johnson ▪ Berkshire Hathaway ▪ Walmart ▪ Royal Dutch Shell ▪ JPMorgan Chase ▪ IBM ▪ Procter & Gamble ▪ Exxon Mobil

▪ Oil and Gas ▪ Pharmaceutical / Medical device company ▪ Data Driven Revenues

© 2019 Mellanox Technologies | Confidential 9

© 2019 Mellanox Technologies | Confidential 10 The Hyper-Scalers (Whales)

▪ How many servers does Google have? ▪ We do not know, they never expose the numbers ▪ There are guestimates… ▪ In 2011 - 900,000 servers ▪ In 2018 - 2,500,000 servers (Source – Gartner) ▪ “As of 2018, Google has invested over $10.5 billion equipping its US

data centers to deliver state-of-the-art services.” (Source - Google) Mayes County, Oklahoma (source – google)

Google data centers in the Dalles, Oregon, 200 (Photo by Craig Mitchelldyer/Getty Images)

© 2019 Mellanox Technologies | Confidential 11 Artificial Intelligence

© 2019 Mellanox Technologies | Confidential 12 Neural Networks Complexity Growth

350X Image ResNet AlexNet GoogleNet Recognition

Inception-V2 Inception-V4

2012 2013 2014 2015 2016

30X Speech DeepSpeech DeepSpeech-2 DeepSpeech-3 Recognition

2014 2015 2016 2017

Complexity = GOPS X Bandwidth

© 2019 Mellanox Technologies | Confidential 14 © 2019 Mellanox Technologies | Confidential 15 Enabling World-Leading Artificial Intelligence Solutions Mellanox Unleashes the Power of Artificial Intelligence

More Better Faster Data Models Interconnect GPUs CPUs ASIC FPGAs Storage

© 2019 Mellanox Technologies | Confidential 16 The Need for Intelligent and Faster Interconnect Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

CPU-Centric (Onload) Data-Centric (Offload)

CPU GPU CPU GPU

GPU CPU GPU CPU Onload Network In-Network Computing CPU GPU CPU GPU

GPU CPU GPU CPU

Must Wait for the Data Analyze Data as it Moves! Creates Performance Bottlenecks Higher Performance and Scale

© 2019 Mellanox Technologies | Confidential 17 An Application Example – Pizza Processing CPU 1 – Pizza Generation CPU 2 – Pizza Consumption ▪ Order Pizza ▪ Call (or use Pizza application) ▪ CPU 1 – prepare Pizza CPU-Centric (Onload) ▪ Tomato sauce, Cheese, Peperoni… ▪ CPU 1 – Put in the oven CPU GPU ▪ And now we wait… ▪ CPU 1 – Pack and send GPU CPU ▪ Network (Pizza Delivery) Onload Network CPU GPU

GPU CPU

Must Wait for the Data Creates Performance Bottlenecks

© 2019 Mellanox Technologies | Confidential 18 What if…

© 2019 Mellanox Technologies | Confidential 19 OK, So What’s Should I Look For

© 2019 Mellanox Technologies | Confidential 20 The Need for Speed

© 2019 Mellanox Technologies | Confidential 21 Mellanox Accelerates TensorFlow 1.5

High BW is a Must For Faster Training Large Scale Models 6.5X with higher BW

6.5X

2.5X

© 2019 Mellanox Technologies | Confidential 22 PeerDirect GPUDirect

© 2019 Mellanox Technologies | Confidential 23 Just Before We Start This is what a (GPU) server looks like

© 2019 Mellanox Technologies | Confidential 24 10X Higher Performance with GPUDirect™ RDMA

GPUDirect™ RDMA ▪ Accelerates HPC and Deep Learning performance

▪ Lowest communication latency for GPUs

© 2019 Mellanox Technologies | Confidential 25 Remote Direct Memory Access RDMA

© 2019 Mellanox Technologies | Confidential 26 Remote Direct Memory Access

▪ Remote ▪ Data transfer between nodes connected by an interconnect

▪ Direct ▪ No operating system kernel involvement in transfer ▪ All transfer operation offloaded to the network card

▪ Memory ▪ Transfer between user-space application virtual memory ▪ No extra copying or buffering

▪ Access ▪ Send / Receive ▪ Read / Write ▪ Atomic

© 2019 Mellanox Technologies | Confidential 27 RDMA vs. TCP

RDMA access model TCP/IP socket access model

▪ Message - preserve user’s message ▪ Byte stream – application recover message boundaries

▪ Asynchronous – no blocking during transfer ▪ Synchronous – block until data is sent / ▪ Starts when work added to work queue received ▪ Finishes when status available in completion queue

▪ Support paired (two sided) and unpaired ▪ send() / recv() are paired (one sided) transfers ▪ Both sides must participate in the transfer

▪ No data copying into system buffers ▪ Requires data copy using system buffers ▪ Memory involved in transfer is untouchable ▪ User memory accessible immediately before and between start and completion of transfer after send() / recv() operations

© 2019 Mellanox Technologies | Confidential 28 Unbeatable Performance with RDMA

▪ Main features TCP/IP ▪ Remote memory read/write semantics in addition to send/receive ▪ Kernel bypass / direct user space access ▪ Full hardware offload for network stack ▪ Secure, channel based IO

▪ Application Advantage ▪ Lowest latency ▪ Highest bandwidth ▪ Lowest CPU consumption ▪ Direct memory access, no unnecessary data copies RDMA ▪ RoCE: RDMA over Converged ▪ Available for all Ethernet speeds 10 – 400G

© 2019 Mellanox Technologies | Confidential 29 RDMA Accelerates TensorFlow

Unmatched Linear Better Scalability at No 50% Performance Additional Cost

© 2019 Mellanox Technologies | Confidential 30 Thank You

© 2019 Mellanox Technologies | Confidential 31