<<

IBM’s Open- Based AI Developer Tools

Sumit Gupta VP, AI, & HPC IBM Cognitive Systems @SumitGup guptasum@us..com

March 2019 AI Portfolio Strategy Deliver a comprehensive platform that enables data science at all skill levels

Build AI Models Train AI Models: Deploy & AI Model Prepare Data (Machine / Deep Interactive or Manage Model Performance Learning) Batch Lifecycle Monitoring With GPU Inference on acceleration CPU, GPU, FPGA

Scale to Enterprise-wide Deployment • Multiple data scientists • Shared Cluster / Hardware Infrastructure Hybrid Cloud: Common experience on-premise and in public cloud

2 IBM Based AI Stack

Auto-AI software: PowerAI Vision, IBM Auto-AI

Watson Watson Studio Machine Learning OpenScale

WML CE Watson ML Accelerator

Watson ML CE Data Preparation Model Metrics, Model Development Runtime Environment Bias, and Fairness Environment Train, Deploy, Manage Models Monitoring

SnapML

Accelerated AC922 Storage Previous Names: Power9 Servers (Spectrum Scale ESS) WML Accelerator = PowerAI Enterprise Runs on & other storage too WML Community Ed. = PowerAI-base Available on Private Cloud or Public Cloud 3 Our Focus: Ease of Use & Faster Model Training Times

Distributed Auto Hyper-Parameter (DDL) Optimization (HPO)

Watson ML Elastic Distributed Training (EDT) & Accelerator Elastic Distributed Inference (EDI) IBM Spectrum Conductor Apache Spark, Cluster , Job Orchestration

Model Management Model Life Cycle Watson ML & Execution Management

WML CE: Open Source ML Frameworks Watson ML Community Edition Snap ML WML CE Large Model Support (LMS) DDL-16

Infrastructure Designed for AI Power9 or x86 Servers Storage with GPU Accelerators (ESS) 4 Snap ML Distributed High Performance Machine Learning

Snap Machine Learning (ML) Library for Popular ML GPU Accelerated Multi-Core CPU Frameworks Logistic Regression Decision Trees Linear Regression Random Forests Ridge / Lasso Regression More coming …. New Support Vector Machines

Multi-Core, Multi-Socket CPU-GPU Memory & GPU Acceleration Management

Distributed Training: Multi-CPU & Multi-GPU

5 Most Popular Why do we need Data Science Methods High-Performance ML

• Performance Matters for • Online Re-training of Models • Model Selection & Hyper- Parameter Tuning Supported by Snap ML • Fast Adaptability to Changes

Deep Learning Support • Scalability to Large Datasets useful for in 2H 19 • Recommendation engines, Advertising, Credit Fraud • Space Exploration, Weather

Source: Kaggle Data Science Survey 2017

6 Logistic Regression: 46x Faster Ridge Regression: 3x Faster Snap ML (GPU) vs TensorFlow (CPU) Power-NVLink-GPUs vs x86-PCIe-GPUs

90 x86 Servers 4 Power9 Servers (CPU-only) With GPUs 80 120 1.1 Hours 106.71 46x Faster 102.10 100 94.94 86.87 60 86.22 80 x86 Server with 1 GPU

40 60

RunTime (s) 40 30.75 30.64 30.66 29.83 30.63 Runtime(Minutes) 20 1.53 20 Power Server with 1 GPU Minutes 0 0 1 2 3 4 5 TensorFlow Snap ML Runs

Advertising Click-through rate prediction Predict volatility of stock price, 10- textual financial reports, Criteo dataset: 4 billion examples, 1 million features 482,610 examples x 4.27M features 7 Snap ML is 2-4x Faster than scikit-learn – CPU-only

Decision Trees Random Forests 3.8x faster 4.2x faster Snap ML on Power vs sklearn on x86 Snap ML on Power vs sklearn on x86

1200 120 2.0x 3.8x 1000 100

800 80 3.0x 600 60

400 40 2.4x Runtime (sec) Runtime (sec) 2.0x 200 3.8x 20 3.0x 4.2x 0 0 creditcard susy higgs epsilon creditcard susy higgs epsilon Datasets Datasets

SnapML (P9) sklearn (x86) SnapML (P9) sklearn (x86) Summary of Performance Results for Snap ML

GPU vs CPU Snap ML vs scikit-learn: Linear Models 20-40x

Power vs x86 with GPUs Snap ML: Linear Models 3x

CPU Only: Power vs x86 Snap ML vs scikit-learn: Models 2-4x

9 Store Large Models & Dataset in Large Model Support (LMS) Enables System Memory Higher Accuracy via Larger Models Transfer One Layer at a Time to GPU

TensorFlow with LMS 35 4.7x Faster Memory Memory 30 170GB/s 170GB/s 25 CPU CPU 20 NVLink NVLink 15 150 GB/s 150 GB/s

Images / sec Images 10 GPU GPU GPU GPU 5

0 IBM AC922 Power9 Server Power + 4 GPUs x86 + 4 GPUs CPU-GPU NVLink 5x Faster 500 Iterations of Enlarged GoogleNet model on Enlarged than x86 PCI-Gen3 ImageNet Dataset (2240x2240), mini-batch size = 15 Both servers with 4 V100 GPUs 10 Distributed Deep Learning (DDL)

Near Ideal (95%) Scaling to 256 GPUs

Deep learning training takes 256 Ideal Scaling days to weeks 128 DDL Actual Scaling Runs within 64 Hours

32

DDL in WML CE extends TensorFlow 16

& enables scaling to 100s of servers Speedup 8 4 Runs for Days 2

1 Automatically distribute and train on 4 16 64 256 large datasets to 100s of GPUs Number of GPUs ResNet-50, ImageNet-1K Caffe with PowerAI DDL, Running on S822LC Power System 11 Auto Hyper-Parameter Optimization (HPO) in WML Accelerator

Run Model Training 100s of Times Lots of Hyperparameters: Change Train Model Manual Process Hyperparameters Learning rate, Decay rate, Batch Manually Choose size, Optimizers (Gradient Parameters Descent, Momentum, ..)

Training Job 1 Monitor & Prune WML Accelerator Training Job 2 Auto-HPO has 3 search Select Best approaches Auto-Hyperparameter Hyperparameters Optimizer (Auto-HPO) Training Job n Random, Tree-based Parzen IBM Spectrum Conductor running Spark Estimator (TPE), Bayesian T0: Job 1 Starts, uses all available GPUs Elastic Distributed Training (EDT) T1: Job 2 Starts, Job 1 gives up 4 GPUs T2: Job 2 gets higher priority, GPU Job 1 gives up GPUs Dynamically Reallocates GPUs within Slots T3: Job 1 finishes, milliseconds 8 Job 2 uses all GPUs 6 Increases Job Throughput and Server / GPU Utilization 4 2 Works with Spark & AI Jobs Job 1 0 Works with Hybrid x86 & Power Cluster 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 Time 8

6

4

2 Servers with 4 GPUs each: total 8 GPUs 2 Job 2 Available Policies: Fair , Preemption, Priority 0

8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 13 Time PowerAI Vision: “Point-and-Click” AI for Images & Video

Label Image or Auto-Train AI Package & Deploy Video Data Model AI Model

14 Core use cases Image Classification Object Detection Image Segmentation

15 Automatic Labeling using PowerAI Vision

Manually Label Some Auto-Label Full Dataset Manually Correct Train DL Model Image / Video Frames with Trained DL Model Labels on Some Data

Repeat Till Labels Achieve Desired Accuracy

16 Remote Inspection & Retail Analytics Worker Safety Compliance Asset Management

Track how customers Zone monitoring, heat Identify faulty or worn- navigate store, identify maps, detection of out equipment in fraudulent actions, loitering, ensure worker remote & hard to reach detect low inventory safety compliance locations Quality Inspection Use Cases

Semiconductor Travel & Oil & Gas Manufacturing Manufacturing Transportation

Utilities Robotic Steel Aerospace & Inspection Manufacturing Manufacturing Defense

18 AI Developer & AI Starter Kit

Power AI DevBox AI Starter Kit

Free 30-Day Licenses for WML Accelerator Pre-installed PowerAI Vision & WML Accelerator (formerly called PowerAI Enterprise) (free to Academia)

Power9 + GPU Desktop PC: $3,449 2 AC922 Accelerated Servers Order from: https://raptorcs.com/POWERAI/ + 1 P9 Storage Server

19 500+ Clients using AI on Power Systems

Power AI Clients at THINK 2019

20 IBM AI Meetups Community Grew 10x in 9 Months

From 6K to 85K Members

Members in 9 Months Groups 100000 80 70 80000 60 60000 50 40 40000 30 20000 20 10 0 0

Jun-18 Jul-18 Aug-18 Sep-18 Oct-18 Nov-18 Dec-18 Jan-19 Feb-19 Members Groups

https://www.meetup.com/topics/powerai/

21 Summary

Watson ML: Machine / Deep Learning Toolkit

Snap ML: Fast Machine Learning Framework

Power AI DevBox & AI Starter Kit

22 Build a Data Science Team Your Developers Can Learn http://cognitiveclass.ai

Identify a Low Hanging Use Case Get Started Today with Machine & Deep Learning Figure Out Data Strategy Consider Pre-Built AI APIs

Hire Consulting Services

Get Started Today at www.ibm.biz/poweraideveloper

23 Additional Details

24 Why are Linear & Tree Models Useful?

GLMs can scale to datasets with billions of examples and/or features & Fast Training still train in minutes to hours

Machine learning models can train to “good-enough” accuracy with Need Less Data much less data than deep learning requires

Linear models explicitly assign an importance to each input feature Interpretability Tree models explicitly illustrate the path to a decision.

Linear models involve much fewer parameters than more complex Less Tuning models (GBMs, Neural Nets)

25