Kubeflow Explained: Portable Machine Learning on Kubernetes a Curated Set of Compatible Tools and Artifacts That Lays a Foundation for Running Production ML Apps
Total Page:16
File Type:pdf, Size:1020Kb
Michelle Casbon Strata Data New York September 13, 2018 Kubeflow Explained: Portable Machine Learning on Kubernetes A curated set of compatible tools and artifacts that lays a foundation for running production ML apps Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components @texasmichelle whoami @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle Is this a clearly No Move along defined problem? ML Yes decision Can it be solved in a Yes Do that deterministic tree way? No Dive in Credit: David Andrzejewski @texasmichelle MACHINE LEARNING Counting things is still really hard. @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle Production code @texasmichelle Moving from local to production Credit: Jörg Wagner and Stefan Prehn @texasmichelle Complexity @texasmichelle Perception Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al. @texasmichelle Reality Credit: Hidden Technical Debt of Machine Learning Systems, D. Sculley, et al. @texasmichelle Data Featurization Training Application Platform Feature Serving Data Ingestion Model Building Extraction Infrastructure Configuration Data Exploration Model Validation Business Logic Process Management Data Model Versioning UI Resource Transformation Management Data Validation Model Auditing Load Balancing Monitoring Data Analysis Distributed Training Logging Training Data Segmentation Continuous Continuous Training Delivery Authentication/ Authorization @texasmichelle Maintainability ● Error resolution, recovery, & prevention ● Speed of iteration ● Versioning @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere @texasmichelle Kubeflow Portability Scalability Composability Full product Support lifecycle specialized Entire stack Native to k8s Single, unified tool hardware, like for common Reduce variability GPUs & TPUs processes between services & Reduce costs environments Improve model performance @texasmichelle Kubeflow Who What Why Data scientists Portable ML products on k8s Because building a platform is too big of a problem to tackle ML researchers v0.2.5 release alone Software engineers Product managers https://github.com/kubeflow/kubeflow @texasmichelle Kubeflow Kubernetes-native Adopt k8s patterns Package infrastructure Support multiple ML platform for ML components together frameworks Microservices Run wherever k8s runs Ksonnet Tensorflow Manage infra Use k8s to manage ML declaratively Move between local -> Pytorch tasks dev -> test -> prod -> Scikit onprem CRDs for distributed Xgboost training Et al. @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle But what is it? @texasmichelle @texasmichelle A curated set of compatible tools and artifacts that lays a foundation for running production ML apps Enables consistency across deployments by providing Kubernetes object templates that bring together disparate components @texasmichelle IAP GKE What's Inside? Ambassador Svc Ambassador Ambassador Ambassador ● Ambassador reverse HTTP proxy ● Central Dashboard ○ JupyterHub Central TF Job ○ Tf-job dashboard Dashboard Operator ● Tf-job operator Jupyterhub TF Job Dashboard @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle Yelp Restaurant Reviews 1. Create a minikube cluster 6. Run training on CPUs 2. Install Kubeflow locally 7. Run training on TPUs 3. Run training locally 8. Create serving and UI 4. Create a GKE cluster 9. Run a notebook on GPUs 5. Install Kubeflow on GKE https://github.com/kubeflow/examples/demos @texasmichelle IAP GKE Ambassador Svc TF Serving Ambassador Ambassador Ambassador Application UI Central TF Job Operator TF Job Dashboard TensorFlow Master TensorFlow Jupyterhub TF Job Dashboard Parameter Servers Notebook 1 2 3 … n Workers 1 2 3 … n @texasmichelle Try it Yourself ● codelabs.developers.google.com ○ Intro to Kubeflow on Google Kubernetes Engine: https://goo.gl/192bs7 ○ Kubeflow End-to-End: GitHub Issue Summarization: https://goo.gl/qLXUTG ● Qwiklabs: https://qwiklabs.com ● Katacoda: https://www.katacoda.com/kubeflow ● GitHub: https://github.com/kubeflow/examples/tree/master/github_issue_summarization ● http://gh-demo.kubeflow.org @texasmichelle Just the Beginning ● Easier setup ● Utilize more k8s features ● Add support for packages, frameworks, libraries, and example models ● You tell us! Get involved ○ github.com/kubeflow ○ kubeflow.slack.com ○ @kubeflow ○ [email protected] ○ Community call Tuesdays alternating 8:30am and 5:30pm Pacific ○ Kubeflow Contributor Summit ■ Sept. 25 in Sunnyvale, CA @texasmichelle Agenda 1 2 3 4 5 Problems Goals What's inside Demo Future Direction @texasmichelle Future Direction ● Release 0.3.0 end of September ● Getting started: single command ● Monitoring with Prometheus ● Codeless hyperparameter tuning with Katib ● API consistency between TFJob and PyTorch ● TFServing follows k8s style patterns ● Jupyterhub ○ More configurable ○ Run distributed TFJobs from a notebook ● Batch prediction ● Benchmarking ● Chainer support @texasmichelle Questions? @texasmichelle.