IBM’s Open-Source Based AI Developer Tools
Sumit Gupta VP, AI, Machine Learning & HPC IBM Cognitive Systems @SumitGup guptasum@us.ibm.com
March 2019 AI Software Portfolio Strategy Deliver a comprehensive platform that enables data science at all skill levels
Build AI Models Train AI Models: Deploy & AI Model Prepare Data (Machine / Deep Interactive or Manage Model Performance Learning) Batch Lifecycle Monitoring With GPU Inference on acceleration CPU, GPU, FPGA
Scale to Enterprise-wide Deployment • Multiple data scientists • Shared Cluster / Hardware Infrastructure Hybrid Cloud: Common experience on-premise and in public cloud
2 IBM Open Source Based AI Stack
Auto-AI software: PowerAI Vision, IBM Auto-AI
Watson Watson Watson Studio Machine Learning OpenScale
WML CE Watson ML Accelerator
Watson ML CE Data Preparation Model Metrics, Model Development Runtime Environment Bias, and Fairness Environment Train, Deploy, Manage Models Monitoring
SnapML
Accelerated AC922 Storage Previous Names: Power9 Servers (Spectrum Scale ESS) WML Accelerator = PowerAI Enterprise Runs on x86 & other storage too WML Community Ed. = PowerAI-base Available on Private Cloud or Public Cloud 3 Our Focus: Ease of Use & Faster Model Training Times
Distributed Deep Learning Auto Hyper-Parameter (DDL) Optimization (HPO)
Watson ML Elastic Distributed Training (EDT) & Accelerator Elastic Distributed Inference (EDI) IBM Spectrum Conductor Apache Spark, Cluster Virtualization, Job Orchestration
Model Management Model Life Cycle Watson ML & Execution Management
WML CE: Open Source ML Frameworks Watson ML Community Edition Snap ML WML CE Large Model Support (LMS) DDL-16
Infrastructure Designed for AI Power9 or x86 Servers Storage with GPU Accelerators (ESS) 4 Snap ML Distributed High Performance Machine Learning Library
Snap Machine Learning (ML) Library APIs for Popular ML GPU Accelerated Multi-Core CPU Frameworks Logistic Regression Decision Trees Linear Regression Random Forests Ridge / Lasso Regression More coming …. New Support Vector Machines
Multi-Core, Multi-Socket CPU-GPU Memory & GPU Acceleration Management
Distributed Training: Multi-CPU & Multi-GPU
5 Most Popular Why do we need Data Science Methods High-Performance ML
• Performance Matters for • Online Re-training of Models • Model Selection & Hyper- Parameter Tuning Supported by Snap ML • Fast Adaptability to Changes
Deep Learning Support • Scalability to Large Datasets useful for in 2H 19 • Recommendation engines, Advertising, Credit Fraud • Space Exploration, Weather
Source: Kaggle Data Science Survey 2017
6 Logistic Regression: 46x Faster Ridge Regression: 3x Faster Snap ML (GPU) vs TensorFlow (CPU) Power-NVLink-GPUs vs x86-PCIe-GPUs
90 x86 Servers 4 Power9 Servers (CPU-only) With GPUs 80 120 1.1 Hours 106.71 46x Faster 102.10 100 94.94 86.87 60 86.22 80 x86 Server with 1 GPU
40 60
RunTime (s) 40 30.75 30.64 30.66 29.83 30.63 Runtime(Minutes) 20 1.53 20 Power Server with 1 GPU Minutes 0 0 1 2 3 4 5 Google TensorFlow Snap ML Runs
Advertising Click-through rate prediction Predict volatility of stock price, 10-K textual financial reports, Criteo dataset: 4 billion examples, 1 million features 482,610 examples x 4.27M features 7 Snap ML is 2-4x Faster than scikit-learn – CPU-only
Decision Trees Random Forests 3.8x faster 4.2x faster Snap ML on Power vs sklearn on x86 Snap ML on Power vs sklearn on x86
1200 120 2.0x 3.8x 1000 100
800 80 3.0x 600 60
400 40 2.4x Runtime (sec) Runtime (sec) 2.0x 200 3.8x 20 3.0x 4.2x 0 0 creditcard susy higgs epsilon creditcard susy higgs epsilon Datasets Datasets
SnapML (P9) sklearn (x86) SnapML (P9) sklearn (x86) Summary of Performance Results for Snap ML
GPU vs CPU Snap ML vs scikit-learn: Linear Models 20-40x
Power vs x86 with GPUs Snap ML: Linear Models 3x
CPU Only: Power vs x86 Snap ML vs scikit-learn: Tree Models 2-4x
9 Store Large Models & Dataset in Large Model Support (LMS) Enables System Memory Higher Accuracy via Larger Models Transfer One Layer at a Time to GPU
TensorFlow with LMS 35 4.7x Faster Memory Memory 30 170GB/s 170GB/s 25 CPU CPU 20 NVLink NVLink 15 150 GB/s 150 GB/s
Images / sec Images 10 GPU GPU GPU GPU 5
0 IBM AC922 Power9 Server Power + 4 GPUs x86 + 4 GPUs CPU-GPU NVLink 5x Faster 500 Iterations of Enlarged GoogleNet model on Enlarged than Intel x86 PCI-Gen3 ImageNet Dataset (2240x2240), mini-batch size = 15 Both servers with 4 NVIDIA V100 GPUs 10 Distributed Deep Learning (DDL)
Near Ideal (95%) Scaling to 256 GPUs
Deep learning training takes 256 Ideal Scaling days to weeks 128 DDL Actual Scaling Runs within 64 Hours
32
DDL in WML CE extends TensorFlow 16
& enables scaling to 100s of servers Speedup 8 4 Runs for Days 2
1 Automatically distribute and train on 4 16 64 256 large datasets to 100s of GPUs Number of GPUs ResNet-50, ImageNet-1K Caffe with PowerAI DDL, Running on S822LC Power System 11 Auto Hyper-Parameter Optimization (HPO) in WML Accelerator
Run Model Training 100s of Times Lots of Hyperparameters: Change Train Model Manual Process Hyperparameters Learning rate, Decay rate, Batch Manually Choose size, Optimizers (Gradient Parameters Descent, Momentum, ..)
Training Job 1 Monitor & Prune WML Accelerator Training Job 2 Auto-HPO has 3 search Select Best approaches Auto-Hyperparameter Hyperparameters Optimizer (Auto-HPO) Training Job n Random, Tree-based Parzen IBM Spectrum Conductor running Spark Estimator (TPE), Bayesian T0: Job 1 Starts, uses all available GPUs Elastic Distributed Training (EDT) T1: Job 2 Starts, Job 1 gives up 4 GPUs T2: Job 2 gets higher priority, GPU Job 1 gives up GPUs Dynamically Reallocates GPUs within Slots T3: Job 1 finishes, milliseconds 8 Job 2 uses all GPUs 6 Increases Job Throughput and Server / GPU Utilization 4 2 Works with Spark & AI Jobs Job 1 0 Works with Hybrid x86 & Power Cluster 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 Time 8
6
4
2 Servers with 4 GPUs each: total 8 GPUs 2 Job 2 Available Policies: Fair share, Preemption, Priority 0
8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 13 Time PowerAI Vision: “Point-and-Click” AI for Images & Video
Label Image or Auto-Train AI Package & Deploy Video Data Model AI Model
14 Core use cases Image Classification Object Detection Image Segmentation
15 Automatic Labeling using PowerAI Vision
Manually Label Some Auto-Label Full Dataset Manually Correct Train DL Model Image / Video Frames with Trained DL Model Labels on Some Data
Repeat Till Labels Achieve Desired Accuracy
16 Remote Inspection & Retail Analytics Worker Safety Compliance Asset Management
Track how customers Zone monitoring, heat Identify faulty or worn- navigate store, identify maps, detection of out equipment in fraudulent actions, loitering, ensure worker remote & hard to reach detect low inventory safety compliance locations Quality Inspection Use Cases
Semiconductor Electronics Travel & Oil & Gas Manufacturing Manufacturing Transportation
Utilities Robotic Steel Aerospace & Inspection Manufacturing Manufacturing Defense
18 AI Developer Box & AI Starter Kit
Power AI DevBox AI Starter Kit
Free 30-Day Licenses for WML Accelerator Pre-installed PowerAI Vision & WML Accelerator (formerly called PowerAI Enterprise) (free to Academia)
Power9 + GPU Desktop PC: $3,449 2 AC922 Accelerated Servers Order from: https://raptorcs.com/POWERAI/ + 1 P9 Linux Storage Server
19 500+ Clients using AI on Power Systems
Power AI Clients at THINK 2019
20 IBM AI Meetups Community Grew 10x in 9 Months
From 6K to 85K Members
Members in 9 Months Groups 100000 80 70 80000 60 60000 50 40 40000 30 20000 20 10 0 0
Jun-18 Jul-18 Aug-18 Sep-18 Oct-18 Nov-18 Dec-18 Jan-19 Feb-19 Members Groups
https://www.meetup.com/topics/powerai/
21 Summary
Watson ML: Machine / Deep Learning Toolkit
Snap ML: Fast Machine Learning Framework
Power AI DevBox & AI Starter Kit
22 Build a Data Science Team Your Developers Can Learn http://cognitiveclass.ai
Identify a Low Hanging Use Case Get Started Today with Machine & Deep Learning Figure Out Data Strategy Consider Pre-Built AI APIs
Hire Consulting Services
Get Started Today at www.ibm.biz/poweraideveloper
23 Additional Details
24 Why are Linear & Tree Models Useful?
GLMs can scale to datasets with billions of examples and/or features & Fast Training still train in minutes to hours
Machine learning models can train to “good-enough” accuracy with Need Less Data much less data than deep learning requires
Linear models explicitly assign an importance to each input feature Interpretability Tree models explicitly illustrate the path to a decision.
Linear models involve much fewer parameters than more complex Less Tuning models (GBMs, Neural Nets)
25