Michelle Casbon Strata Data New York September 13, 2018 Kubeflow Explained: Portable on A curated set of compatible tools and artifacts that lays a foundation for running production ML apps

Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components

@texasmichelle whoami

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle Is this a clearly No Move along defined problem?

ML Yes

decision Can it be solved in a Yes Do that deterministic tree way?

No

Dive in Credit: David Andrzejewski

@texasmichelle MACHINE LEARNING

Counting things is still really hard.

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle Production code

@texasmichelle Moving from local to production

Credit: Jörg Wagner and Stefan Prehn @texasmichelle

Complexity

@texasmichelle Perception

Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.

@texasmichelle Reality

Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.

@texasmichelle Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Extraction Infrastructure Configuration

Data Exploration Model Validation Business Logic Process Management Data Model Versioning UI Resource Transformation Management

Data Validation Model Auditing Load Balancing Monitoring

Data Analysis Distributed Training Logging Training Data Segmentation Continuous Continuous Training Delivery

Authentication/ Authorization @texasmichelle Maintainability

● Error resolution, recovery, & prevention ● Speed of iteration ● Versioning

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere

@texasmichelle Kubeflow

Portability Scalability Composability Full product Support lifecycle specialized Entire stack Native to k8s Single, unified tool hardware, like for common Reduce variability GPUs & TPUs processes between services & Reduce costs environments Improve model performance

@texasmichelle Kubeflow

Who What Why

Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers v0.2.5 release alone Software engineers

Product managers

https://github.com/kubeflow/kubeflow

@texasmichelle Kubeflow

Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML platform for ML components together frameworks Microservices Run wherever k8s runs Ksonnet Tensorflow Manage infra Use k8s to manage ML declaratively Move between local -> Pytorch tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al.

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle But what is it?

@texasmichelle @texasmichelle A curated set of compatible tools and artifacts that lays a foundation for running production ML apps

Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components

@texasmichelle IAP

GKE What's Inside? Ambassador Svc

Ambassador Ambassador Ambassador ● Ambassador reverse HTTP proxy ● Central Dashboard

○ JupyterHub Central TF Job ○ Tf-job dashboard Dashboard Operator ● Tf-job operator

Jupyterhub TF Job Dashboard

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle Yelp Restaurant Reviews

1. Create a minikube cluster 6. Run training on CPUs 2. Install Kubeflow locally 7. Run training on TPUs 3. Run training locally 8. Create serving and UI 4. Create a GKE cluster 9. Run a notebook on GPUs 5. Install Kubeflow on GKE https://github.com/kubeflow/examples/demos

@texasmichelle IAP

GKE

Ambassador Svc

TF Serving

Ambassador Ambassador Ambassador

Application UI

Central TF Job Operator TF Job Dashboard TensorFlow Master

TensorFlow

Jupyterhub TF Job Dashboard Parameter Servers

Notebook 1 2 3 … n

Workers

1 2 3 … n

@texasmichelle Try it Yourself

● codelabs.developers..com ○ Intro to Kubeflow on Google Kubernetes Engine: https://goo.gl/192bs7 ○ Kubeflow End-to-End: GitHub Issue Summarization: https://goo.gl/qLXUTG ● Qwiklabs: https://qwiklabs.com ● Katacoda: https://www.katacoda.com/kubeflow ● GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● http://gh-demo.kubeflow.org

@texasmichelle Just the Beginning

● Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, libraries, and example models ● You tell us! Get involved ○ github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ [email protected] ○ Community call Tuesdays alternating 8:30am and 5:30pm Pacific ○ Kubeflow Contributor Summit ■ Sept. 25 in Sunnyvale, CA

@texasmichelle Agenda

1 2 3 4 5

Problems Goals What's inside Demo Future Direction

@texasmichelle Future Direction

● Release 0.3.0 end of September ● Getting started: single command ● Monitoring with Prometheus ● Codeless hyperparameter tuning with Katib ● API consistency between TFJob and PyTorch ● TFServing follows k8s style patterns ● Jupyterhub ○ More configurable ○ Run distributed TFJobs from a notebook ● Batch prediction ● Benchmarking ● Chainer support

@texasmichelle Questions?

@texasmichelle