Accelerate AI/ML with Kubeflow

Han Yang, PhD, Senior Product Manager September, 2019 Data Pipeline for Multiple Data Sources

Collect Clean Correlate Train Model Collect Data

Clean Collect Clean Correlate Train Model Model Social Result Correlate More Data Collect Clean Correlate Train Model Train Video

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential You Are Here Many Verticals Where is Your Data?

© 2017 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Complete Data Pipeline: Data Center and Remote

Collect Clean Correlate Train

Data

Model Model

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco ConfidentialRemote Data Center Problem: Hybrid Cloud Consistent AI/ML

• After Data Scientist doing some experiments in the cloud, how to have bring it back with Collect Clean Correlate Train consistent Data tools?

• How to have a single data pipeline go from cloud to on- premise?

• How to deploy models where it’s needed?

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Solution: Kubeflow Pipelines on UCS and Cloud

• Kubeflow: Integrating ML tools into with reusable data Collect Clean Correlate Train pipeline software components Data

• For IT

• Cisco HX / Google Anthos integration

• Intersite to configure UCS/HX

• For DS

• Consistent ML tools on-prem and cloud

• Data pipeline extending between on- prem and cloud Cisco HyperFlex • Intersight to support remote pipeline © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Recognizing Bolts Based on Inches vs. Centimeters HFE

• Bolts based on inches vs. centimeters are hard to distinguish: Wrong bolt can ruin equipment • Use machine learning image classification to identify different types of bolts SFC • Kubeflow workflow for training, model evaluation, and inferencing • Run on Cisco UCS and Google

© 2018Cloud Cisco and/or its affiliates. All rights reserved. Cisco Confidential

© 2019 Cisco and/or its affiliates. All rights reserved. Cisco Public Kubeflow Data Pipeline Monitors training Feed test data to Train progress new model

Deploy Deploy Predict TensorBoard On-Prem to Cloud

Model Put model into Put model into cloud Quality inferencing server to and inference with URL call model with URL

Evaluate new

© 2018 Cisco and/or its affiliates. Allmodel rights reserved. quality Cisco Confidential

PSODCN-2877 Cisco: One of the Leading Contributors to Kubeflow Over 2.8M Lines of Code with 3 Major Proposals

• Kubebench: Originated and implemented benchmark for Kubeflow implementation

• PyTorch Operator: Continuous improvement and maintenance • Katib:

• Hyperparameter search

• AutoML with Neural Architectural Search Google Technology Partner of the Year

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

https://blogs.cisco.com/cloud/consistent-ai-the-journey-together Hyperparameter Tuning with Katib

Background Katib

• Hyperparameters: Parameters • Neural architectural search: that determine the model Figures out best neural

• Width & depth of neural network network architecture with

• Learning rate automated exploration of hyperparameter space

• Cancel run if learning not • Hyperparemeter tuning: Finding best combination of progressing well parameters to provide best trained models Hyperparameter Tuning Requires a LOT of Compute Cycles © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Activating Data with the Power of UCS Cisco Computing Solutions for AI/ML

Unified Architecture Demystifying AI/ML Powering the Full AI Stacks Data Lifecycle

Accelerating Insight Adaptable Cloud-Managed Validated Solutions with and Action Systems for Distributed IT Industry Leaders

© 2018 Cisco and/or its affiliates. All rights reserved. Cisco Confidential