IBM's Open-Source Based AI Developer Tools

IBM’s Open-Source Based AI Developer Tools Sumit Gupta VP, AI, Machine Learning & HPC IBM Cognitive Systems @SumitGup [email protected] March 2019 AI Software Portfolio Strategy Deliver a comprehensive platform that enables data science at all skill levels Build AI Models Train AI Models: Deploy & AI Model Prepare Data (Machine / Deep Interactive or Manage Model Performance Learning) Batch Lifecycle Monitoring With GPU Inference on acceleration CPU, GPU, FPGA Scale to Enterprise-wide Deployment • Multiple data scientists • Shared Cluster / Hardware Infrastructure Hybrid Cloud: Common experience on-premise and in public cloud 2 IBM Open Source Based AI Stack Auto-AI software: PowerAI Vision, IBM Auto-AI Watson Watson Watson Studio Machine Learning OpenScale WML CE Watson ML Accelerator Watson ML CE Data Preparation Model Metrics, Model Development Runtime Environment Bias, and Fairness Environment Train, Deploy, Manage Models Monitoring SnapML Accelerated AC922 Storage Previous Names: Power9 Servers (Spectrum Scale ESS) WML Accelerator = PowerAI Enterprise Runs on x86 & other storage too WML Community Ed. = PowerAI-base Available on Private Cloud or Public Cloud 3 Our Focus: Ease of Use & Faster Model Training Times Distributed Deep Learning Auto Hyper-Parameter (DDL) Optimization (HPO) Watson ML Elastic Distributed Training (EDT) & Accelerator Elastic Distributed Inference (EDI) IBM Spectrum Conductor Apache Spark, Cluster Virtualization, Job Orchestration Model Management Model Life Cycle Watson ML & Execution Management WML CE: Open Source ML Frameworks Watson ML Community Edition Snap ML WML CE Large Model Support (LMS) DDL-16 Infrastructure Designed for AI Power9 or x86 Servers Storage with GPU Accelerators (ESS) 4 Snap ML Distributed High Performance Machine Learning Library Snap Machine Learning (ML) Library APIs for Popular ML GPU Accelerated Multi-Core CPU Frameworks Logistic Regression Decision Trees Linear Regression Random Forests Ridge / Lasso Regression More coming …. New Support Vector Machines Multi-Core, Multi-Socket CPU-GPU Memory & GPU Acceleration Management Distributed Training: Multi-CPU & Multi-GPU 5 Most Popular Why do we need Data Science Methods High-Performance ML • Performance Matters for • Online Re-training of Models • Model Selection & Hyper- Parameter Tuning Supported by Snap ML • Fast Adaptability to Changes Deep Learning Support • Scalability to Large Datasets useful for in 2H 19 • Recommendation engines, Advertising, Credit Fraud • Space Exploration, Weather Source: Kaggle Data Science Survey 2017 6 Logistic Regression: 46x Faster Ridge Regression: 3x Faster Snap ML (GPU) vs TensorFlow (CPU) Power-NVLink-GPUs vs x86-PCIe-GPUs 90 x86 Servers 4 Power9 Servers (CPU-only) With GPUs 80 120 1.1 Hours 106.71 46x Faster 102.10 100 94.94 86.87 60 86.22 80 x86 Server with 1 GPU 40 60 RunTime (s) 40 30.75 30.64 30.66 29.83 30.63 Runtime (Minutes) 20 1.53 20 Power Server with 1 GPU Minutes 0 0 1 2 3 4 5 Google TensorFlow Snap ML Runs Advertising Click-through rate prediction Predict volatility of stock price, 10-K textual financial reports, Criteo dataset: 4 billion examples, 1 million features 482,610 examples x 4.27M features 7 Snap ML is 2-4x Faster than scikit-learn – CPU-only Decision Trees Random Forests 3.8x faster 4.2x faster Snap ML on Power vs sklearn on x86 Snap ML on Power vs sklearn on x86 1200 120 2.0x 3.8x 1000 100 800 80 3.0x 600 60 400 40 2.4x Runtime (sec) Runtime (sec) 2.0x 200 3.8x 20 3.0x 4.2x 0 0 creditcard susy higgs epsilon creditcard susy higgs epsilon Datasets Datasets SnapML (P9) sklearn (x86) SnapML (P9) sklearn (x86) Summary of Performance Results for Snap ML GPU vs CPU Snap ML vs scikit-learn: Linear Models 20-40x Power vs x86 with GPUs Snap ML: Linear Models 3x CPU Only: Power vs x86 Snap ML vs scikit-learn: Tree Models 2-4x 9 Store Large Models & Dataset in Large Model Support (LMS) Enables System Memory Higher Accuracy via Larger Models Transfer One Layer at a Time to GPU TensorFlow with LMS 35 4.7x Faster Memory Memory 30 170GB/s 170GB/s 25 CPU CPU 20 NVLink NVLink 15 150 GB/s 150 GB/s Images Images / sec 10 GPU GPU GPU GPU 5 0 IBM AC922 Power9 Server Power + 4 GPUs x86 + 4 GPUs CPU-GPU NVLink 5x Faster 500 Iterations of Enlarged GoogleNet model on Enlarged than Intel x86 PCI-Gen3 ImageNet Dataset (2240x2240), mini-batch size = 15 Both servers with 4 NVIDIA V100 GPUs 10 Distributed Deep Learning (DDL) Near Ideal (95%) Scaling to 256 GPUs Deep learning training takes 256 Ideal Scaling days to weeks 128 DDL Actual Scaling Runs within 64 Hours 32 DDL in WML CE extends TensorFlow 16 & enables scaling to 100s of servers Speedup 8 4 Runs for Days 2 1 Automatically distribute and train on 4 16 64 256 large datasets to 100s of GPUs Number of GPUs ResNet-50, ImageNet-1K Caffe with PowerAI DDL, Running on S822LC Power System 11 Auto Hyper-Parameter Optimization (HPO) in WML Accelerator Run Model Training 100s of Times Lots of Hyperparameters: Change Train Model Manual Process Hyperparameters Learning rate, Decay rate, Batch Manually Choose size, Optimizers (Gradient Parameters Descent, Momentum, ..) Training Job 1 Monitor & Prune WML Accelerator Training Job 2 Auto-HPO has 3 search Select Best approaches Auto-Hyperparameter Hyperparameters Optimizer (Auto-HPO) Training Job n Random, Tree-based Parzen IBM Spectrum Conductor running Spark Estimator (TPE), Bayesian T0: Job 1 Starts, uses all available GPUs Elastic Distributed Training (EDT) T1: Job 2 Starts, Job 1 gives up 4 GPUs T2: Job 2 gets higher priority, GPU Job 1 gives up GPUs Dynamically Reallocates GPUs within Slots T3: Job 1 finishes, milliseconds 8 Job 2 uses all GPUs 6 Increases Job Throughput and Server / GPU Utilization 4 2 Works with Spark & AI Jobs Job 1 0 Works with Hybrid x86 & Power Cluster 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 Time 8 6 4 2 Servers with 4 GPUs each: total 8 GPUs 2 Job 2 Available Policies: Fair share, Preemption, Priority 0 8:09 8:10 8:11 8:12 8:13 8:14 8:15 8:16 8:17 8:18 8:19 8:20 8:21 13 Time PowerAI Vision: “Point-and-Click” AI for Images & Video Label Image or Auto-Train AI Package & Deploy Video Data Model AI Model 14 Core use cases Image Classification Object Detection Image Segmentation 15 Automatic Labeling using PowerAI Vision Manually Label Some Auto-Label Full Dataset Manually Correct Train DL Model Image / Video Frames with Trained DL Model Labels on Some Data Repeat Till Labels Achieve Desired Accuracy 16 Remote Inspection & Retail Analytics Worker Safety Compliance Asset Management Track how customers Zone monitoring, heat Identify faulty or worn- navigate store, identify maps, detection of out equipment in fraudulent actions, loitering, ensure worker remote & hard to reach detect low inventory safety compliance locations Quality Inspection Use Cases Semiconductor Electronics Travel & Oil & Gas Manufacturing Manufacturing Transportation Utilities Robotic Steel Aerospace & Inspection Manufacturing Manufacturing Defense 18 AI Developer Box & AI Starter Kit Power AI DevBox AI Starter Kit Free 30-Day Licenses for WML Accelerator Pre-installed PowerAI Vision & WML Accelerator (formerly called PowerAI Enterprise) (free to Academia) Power9 + GPU Desktop PC: $3,449 2 AC922 Accelerated Servers Order from: https://raptorcs.com/POWERAI/ + 1 P9 Linux Storage Server 19 500+ Clients using AI on Power Systems Power AI Clients at THINK 2019 20 IBM AI Meetups Community Grew 10x in 9 Months From 6K to 85K Members Members in 9 Months Groups 100000 80 70 80000 60 60000 50 40 40000 30 20000 20 10 0 0 Jun-18 Jul-18 Aug-18 Sep-18 Oct-18 Nov-18 Dec-18 Jan-19 Feb-19 Members Groups https://www.meetup.com/topics/powerai/ 21 Summary Watson ML: Machine / Deep Learning Toolkit Snap ML: Fast Machine Learning Framework Power AI DevBox & AI Starter Kit 22 Build a Data Science Team Your Developers Can Learn http://cognitiveclass.ai Identify a Low Hanging Use Case Get Started Today with Machine & Deep Learning Figure Out Data Strategy Consider Pre-Built AI APIs Hire Consulting Services Get Started Today at www.ibm.biz/poweraideveloper 23 Additional Details 24 Why are Linear & Tree Models Useful? GLMs can scale to datasets with billions of examples and/or features & Fast Training still train in minutes to hours Machine learning models can train to “good-enough” accuracy with Need Less Data much less data than deep learning requires Linear models explicitly assign an importance to each input feature Interpretability Tree models explicitly illustrate the path to a decision. Linear models involve much fewer parameters than more complex Less Tuning models (GBMs, Neural Nets) 25.

Load more