Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

S91030 - Hybrid Machine Learning with the Kubeflow Pipelines and RAPIDS

Sina Chavoshi Cloud AI Strategy: The right approach for the right problem

Building blocks Platform Solutions Cloud AI Strategy: The right approach for the right problem

Building blocks Platform Solutions Building Blocks

Sight Language Conversation Cloud AI Strategy: The right approach for the right problem

Building blocks Platform Solutions Solutions / Contact Center

Google Cloud Contact Center AI

Phone

Backend Fulfillment

Contact Center Contact Center Virtual Provider Interface Agent

Customer Knowledge Base (PDF/HTML)

Agent Chat Assist Agent

Virtual Agent Cloud AI Strategy: The right approach for the right problem

Building blocks Platform Solutions Cloud AI Platform

Data pipeline Model development

Cloud BigQuery Cloud ML Dataprep Engine Model deployment and management

Cloud Cloud Dataflow Dataproc Cloud ML Cloud Kubeflow Engine Kubernetes Engine

Tools Services Community

Jupyter ASL Notebooks Building & deploying real-life ML applications is hard and costly because of lack of tooling that covers end-to-end ML development & deployment. In addition to the actual ML...

ML Code You have to worry about so much more.

Data Monitoring Verification Configuration Data Collection ML Analysis Tools Code

Serving Process Management Machine Infrastructure Tools Resource Feature Extraction Management

Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems AI problems today

Problems Solutions Deployment Brittle, opinionated infrastructure that is hard to 01 productionize and breaks between cloud and on-prem

Talent 02 Machine Learning expertise is scarce Reusable pipelines

Collaboration 03 Difficult to find, leverage existing solutions 01: Kubeflow ML microservices Scalable ML services on Kubernetes

Easy to get started • Out-of-box support for top frameworks – pytorch, caffe, tf and xgboost Training Predict • Kubernetes manages dependencies, resources Cloud Swappable & scalable • Library of ML services • GPU support • Massive scale Training Predict Meet customer where they are • GCP On-prem • On-prem with Cisco RAPIDS Product Overview THE BIG PROBLEM IN DATA SCIENCE

Manage Data Training Evaluate Deploy

Data Model All ETL Structured Visualization Scoring Data Data Store Preparation Training

Slow Training Times for Data Scientists RAPIDS — OPEN GPU DATA SCIENCE Software Stack Python

Data Preparation Model Training Graph Analytics cuDF cuML cuGRAPH

PYTHON

DEEP LEARNING

RAPIDS FRAMEWORKS

CUDF CUML CUGRAPH CUDNN DASK/SPARK CUDA

APACHE ARROW on GPU Memory BENCHMARKS cuIO/cuDF — Load and Data Preparation cuML — XGBoost End-to-End

Time in seconds — Shorter is better

cuIO / cuDF (Load and Data Preparation) Data Conversion XGBoost

Benchmark CPU Cluster Configuration DGX Cluster Configuration

200GB CSV dataset; Data preparation CPU nodes (61 GiB of memory, 8 vCPUs, 5x DGX-1 on InfiniBand network includes joins, variable 64-bit platform), Apache Spark transformations. AI Hub & Pipelines: Fast & simple adoption of AI

The Flywheel of AI Adoption

1. Search & Discover 2. Deploy Find best-of-breed solutions on the AI Quick 1-click implementation of ML Hub which leverage Cloud AI solutions pipelines onto Google Cloud Platform .

5. Publish Network 3. Customize Upload & share pipelines running best effect Experiment and adjustment within your org or publicly. out-of-the-box pipelines to custom use cases.

4. Run in production Deploy customized pipelines in production. 02: Reusable Pipelines

Enable developers to build custom ML applications by easily “stitching” and connecting various components.

• Reuse instead of reimplement or reinvent • Discover, learn and replicate successful pipelines What constitutes a Kubeflow Pipeline

● Containerized implementations of ML Tasks ○ Containers provide portability, repeatability and encapsulation ○ A task can be single node or *distributed* ○ A containerized task can invoke other services

● Specification of the sequence of steps ○ Specified via Python SDK

● Input Parameters ○ A “Job” = Pipeline invoked w/ specific parameters 03: AI Hub at a glance

All AI content in one place 1 Quick discovery of plug & play AI pipelines & other content built by teams across Google and by partners and customers.

Fast & simple implementation of AI on GCP 2 One-click deployment of AI pipelines via Kubeflow on GCP as the go-to platform for AI + hybrid & on premise.

Enterprise-grade internal & external sharing 3 Foster reuse by sharing deployable AI pipelines & other content privately within organizations & publicly. Mission

The one place for everything AI, from experimentation to production. Public and private AI Hub

Public content + Private content

By Google By partners By customers Unique AI assets by Google Created, shared & monetized Content shared securely within by anyone and with other organizations

AutoML, TPUs, Cloud AI Platform, etc. Kubeflow Pipelines Workflow Rapid reliable Share, re-use & enable orchestration experimentation compose Demo Visual depiction of pipeline topology View all current and historical runs, grouped as “Experiments” Rich visualizations of metrics Clone an existing pipeline Access to all config params, inputs and outputs for each run Update parameters and submit Easy comparison of Runs Easy comparison of Runs

That’s a wrap.