Hacking AI on the Cloud
Lisa Amini, PhD Lab Director, IBM Research Cambridge
© 2018 IBM Corporation. We are in a period of rapid innovation and emerging adoption of data science, machine learning and AI Enterprise AI presents unique and challenging requirements Challenges of Building and deploying AI Models Today
• “Roll your own” home-brew environments • Stateful, compute-intensive execution at odds with cloud-native design AI Models • Stresses cloud networking, storage, and hardware Training • Open source components evolving at different rates and speed
• Testing and debugging neural nets is an active research topic • Model evolution based on feedback is unavailable • Enterprise readiness for compliance and traceability is not well Deployment understood
• Must handle streaming data • Near-real-time response required though inferencing on large deep learning networks is compute intensive Inference • Must be able to run in the cloud and at the edge Data Scientists Business Owners
Want to: Want to: • Use DL frameworks they love • Train large models with big data • Provide cognitive capabilities to my users • Rapidly experiment with different models • Rapidly innovate product features • Focus on business logic • Access cutting edge AI algorithms • Get the best accuracy • Be adaptive and flexible
Don’t want to: • Configure machines with GPUs • Manage the frameworks • Handle failure and recovery • Manage load balancing, storage, security and logging • Manage distributed processing 3 approaches for building AI
1 pre-trained AI 2 transfer learning 3 custom AI
+ + + + +
your your pre-trained app developer pre-trained app developer data scientist custom domain domain model or SME model or SME model data data
Watson Visual Recognition Watson Visual Recognition Watson Studio Natural Language Understanding Natural Language Classifier Watson Machine Learning Watson Speech to Text Watson Speech to Text Deep Learning as a Service Watson Text to Speech … … … Watson Developer Cloud APIs Pretrained AI Language – * Watson Assistant – Personality Insights – Natural Language Classifier – Tone Analyzer Data – Language Translator – Natural Language Understanding / Insights AlchemyLanguage • Includes Emotion analysis
Data Insights – * Discovery – Discovery News Language Speech
Speech – Speech to Text – Text to Speech Vision Vision – Visual Recognition
https://www.ibm.com/watson/products-services/ * Higher level APIs that include tooling to simplify implementation https://console.bluemix.net/catalog/?category=watson 7 IBM For the latest view of Watson API’s available, go to: www/ibm.com/watsondevelopercloud Watson Studio, with Deep Learning Create a Watson Studio account: https://www.ibm.com/cloud/watson-studio
9 IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation Several ways to build custom AI
SPSS Modeler Data Refinery “non-coder/ clicker” Neural Network Modeler data scientists interactive Programming frameworks
Notebooks
“coder” data scientists vi researchers PyCharm terminal Group Name / DOC ID / Month XX, 2017 / © 2017 IBM Corporation 11 Deep Learning on the Cloud (DLaaS): Overview DLaaS
– A “serverless” user experience: focus on Neural Networks and data
– Adaptive deep learning as a service: cloud-native support for popular frameworks; elastic, resilient, and SLA driven
– Enabling quick time-to-value for a fast evolving field by architecting for continuous experimentation.
AI Studio/ Data Science Watson Services Experience User experience and tools Experiment Studio (visualization, multiple parallel jobs, autonn, model governance and management) API (WML): train/manage/watch Innovation in neural net design frameworks Advances in DL frameworks DLaaS supports multiple popular DL frameworks
Improvement in training techniques High performance distribution
DLaaS container and resource management Advances in the cloud stack GPU enabled kubernetes (Armada), Armada advanced scheduling (GPU enabled) Infrastructure (Softlayer)
CPUs NVIDIA GPUs New hardware New Accelerators https://dlaas-api.stage1.mybluemix.net https://dlaas-guide.stage1.mybluemix.net Deep Learning on the Cloud: Goal Where data scientists run deep learning experiments Run experiments, provide model tuning and comparison tools to evaluate models across 100-1000’s of hyperparameter configurations
experiments parameter permutation, data science model adjustments teams model parameter definition space model hyperparameters
optimizer hyperparameters chosen model
training artifacts models snapshots losses DL runtime Experimentation accuracies Parallel Model Tuner Studio logs Model training distributed across containers Multi-tenant container orchestration
training runs
containers
Kubernetes container orchestration
server cluster
NVIDIA GPUs dataset
Cloud Object Storage
1 4 Key Concepts
Watson Machine Learning Service Cloud Object Storage (COS) (WML) Object storage is the primary data IBM WML is a comprehensive solution resource used in the cloud, and it is that is your gateway to various machine also increasingly used for on-premise and deep learning technologies. solutions.
Provides the following capabilities Inexpensive, scalable, self-healing - Automated model building storage for massive amounts of - Model training and optimization unstructured data. - Model deployment IBM Cloud Object Storage can support To interact with WML, you need to single objects as large as 10TB. create a WML instance
IBM Research AI 1 5 Getting Started: Online Docs, Tutorials, Blogs, …
https://console.bluemix.net/docs/tutorials/index.html#tutorials IBM Cloud / Watson and Cloud Platform / © 2018 IBM Corporation 16 Questions?
IBM Research AI | Confidential | Copyright 2017 17 Watson Studio Supporting the end-to-end Enterprise AI workflow
Connect & Search and Find Prepare Data Build and Train Monitor, Analyze Deploy Models Access Data Relevant Data for Analysis ML/DL Models and Manage
Connect and Find data (structured, Clean and prepare Democratize the Deploy your models Monitor the discover content unstructured) and AI your data with Data creation of ML and DL easily and have them performance of the from multiple data assets (e.g., ML/DL Refinery, a tool to models. Design your AI scale automatically for models in production sources in the cloud models, notebooks, create data preparation models online, batch or and trigger automatic or on premises. Watson Data Kits) in pipelines visually. programmatically or streaming use cases retraining and Bring structured and the Knowledge Catalog Use popular open visually with the most redeployment of unstructured data to with intelligent search source libraries to popular open source models. Build one toolkit. and giving the right prepare unstructured and IBM ML/DL Enterprise Trust with access to the right data. frameworks or leverage Bias Detection, users. transfer learning on Mitigation Model pre-trained models Robustness and Testing using Watson tools to Service Model Security. adapt to your business domain. Train at scale on GPUs and distributed compute Cloud Object Storage
• Object storage is the primary data resource used in the cloud, and it is also increasingly used for on-premise solutions. • Inexpensive, scalable, self-healing storage for massive amounts of unstructured data. • IBM® Cloud Object Storage can support single objects as large as 10TB when using multipart uploads. Compare Training Runs
2 0 DLaaS – Basic Flow Model 1. Create model in DL framework Spec supported by DLaaS (Caffe, Tensorflow, Torch, 4. Start a training job Theano …) 5. Query training status and retrieve logs Object 6. Get trained model storage 2. Store training data in object storage Training Training Trained logs, Request Model status
3. Specify model metadata manifest.yml (framework, …); resource requirements DLaaS API: /v1/models (GPUs, mem, num learners, …); pointer to training data (object store, s3, …) Kubernetes cluster runs training jobs How to access TensorBoard?
https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard
> python download_training_run_files.py TRAINING_RUN > tensorboard --logdir=path/to/log-directory
View Tensorboard at http://localhost:6006 MIT-IBM Watson AI Lab $240M 10 year commitment to jointly create the future of artificial intelligence
AI algorithms Physics of AI Applications of AI Advancing to industries shared prosperity • Learning and • Analog AI through AI Reasoning • Healthcare, Life • AI & Quantum Sciences • Ethics of AI, fair, • Continuous, multi- unbiased 23 task, small data, • Cybersecurity explanations, … http://mitibmwatsonailab.mit.edu/ • AI for Social Good