Michelle Casbon Strata Data New York September 13, 2018 Kubeflow Explained: Portable Machine Learning on Kubernetes A curated set of compatible tools and artifacts that lays a foundation for running production ML apps
Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components
@texasmichelle whoami
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle Is this a clearly No Move along defined problem?
ML Yes
decision Can it be solved in a Yes Do that deterministic tree way?
No
Dive in Credit: David Andrzejewski
@texasmichelle MACHINE LEARNING
Counting things is still really hard.
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle Production code
@texasmichelle Moving from local to production
Credit: Jörg Wagner and Stefan Prehn @texasmichelle
Complexity
@texasmichelle Perception
Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.
@texasmichelle Reality
Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al.
@texasmichelle Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Extraction Infrastructure Configuration
Data Exploration Model Validation Business Logic Process Management Data Model Versioning UI Resource Transformation Management
Data Validation Model Auditing Load Balancing Monitoring
Data Analysis Distributed Training Logging Training Data Segmentation Continuous Continuous Training Delivery
Authentication/ Authorization @texasmichelle Maintainability
● Error resolution, recovery, & prevention ● Speed of iteration ● Versioning
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere
@texasmichelle Kubeflow
Portability Scalability Composability Full product Support lifecycle specialized Entire stack Native to k8s Single, unified tool hardware, like for common Reduce variability GPUs & TPUs processes between services & Reduce costs environments Improve model performance
@texasmichelle Kubeflow
Who What Why
Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers v0.2.5 release alone Software engineers
Product managers
https://github.com/kubeflow/kubeflow
@texasmichelle Kubeflow
Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML platform for ML components together frameworks Microservices Run wherever k8s runs Ksonnet Tensorflow Manage infra Use k8s to manage ML declaratively Move between local -> Pytorch tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al.
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle But what is it?
@texasmichelle @texasmichelle A curated set of compatible tools and artifacts that lays a foundation for running production ML apps
Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components
@texasmichelle IAP
GKE What's Inside? Ambassador Svc
Ambassador Ambassador Ambassador ● Ambassador reverse HTTP proxy ● Central Dashboard
○ JupyterHub Central TF Job ○ Tf-job dashboard Dashboard Operator ● Tf-job operator
Jupyterhub TF Job Dashboard
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle Yelp Restaurant Reviews
1. Create a minikube cluster 6. Run training on CPUs 2. Install Kubeflow locally 7. Run training on TPUs 3. Run training locally 8. Create serving and UI 4. Create a GKE cluster 9. Run a notebook on GPUs 5. Install Kubeflow on GKE https://github.com/kubeflow/examples/demos
@texasmichelle IAP
GKE
Ambassador Svc
TF Serving
Ambassador Ambassador Ambassador
Application UI
Central TF Job Operator TF Job Dashboard TensorFlow Master
TensorFlow
Jupyterhub TF Job Dashboard Parameter Servers
Notebook 1 2 3 … n
Workers
1 2 3 … n
@texasmichelle Try it Yourself
● codelabs.developers.google.com ○ Intro to Kubeflow on Google Kubernetes Engine: https://goo.gl/192bs7 ○ Kubeflow End-to-End: GitHub Issue Summarization: https://goo.gl/qLXUTG ● Qwiklabs: https://qwiklabs.com ● Katacoda: https://www.katacoda.com/kubeflow ● GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● http://gh-demo.kubeflow.org
@texasmichelle Just the Beginning
● Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, libraries, and example models ● You tell us! Get involved ○ github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ [email protected] ○ Community call Tuesdays alternating 8:30am and 5:30pm Pacific ○ Kubeflow Contributor Summit ■ Sept. 25 in Sunnyvale, CA
@texasmichelle Agenda
1 2 3 4 5
Problems Goals What's inside Demo Future Direction
@texasmichelle Future Direction
● Release 0.3.0 end of September ● Getting started: single command ● Monitoring with Prometheus ● Codeless hyperparameter tuning with Katib ● API consistency between TFJob and PyTorch ● TFServing follows k8s style patterns ● Jupyterhub ○ More configurable ○ Run distributed TFJobs from a notebook ● Batch prediction ● Benchmarking ● Chainer support
@texasmichelle Questions?
@texasmichelle