Deep Learning and Reinforcement Learning Workflows in A.I.
Emmanuel Blanchard
© 2015 The MathWorks, Inc.1 A.I. with MATLAB and Simulink: Atlas Robot
2 Why should you care about Reinforcement Learning?
▪ It enables the use of deep learning for controls and decision-making applications
Robotics Controls
Autonomous driving Game Play 3 Why should you care about Reinforcement Learning?
4 What is Reinforcement Learning?
▪ What is Reinforcement Learning? – Type of machine learning that trains an ‘agent’ through repeated interactions with an environment
▪ How does it work? – Through a trial & error process that uses a reward system to maximize success
5 Agenda
Background: Reinforcement Learning vs Machine Learning vs Deep Learning
Deep Learning Workflows and Challenges
Reinforcement Learning (MATLAB + Simulink)
Conclusion
6 What is Machine Learning?
7 Machine Learning vs Deep Learning
Machine Learning
Unsupervised Supervised Learning Learning [Labeled Data] [No Labeled Data] Supervised learning typically involves feature extraction
Clustering Classification Regression
Deep Learning is subset of machine learning Deep Learning with automatic feature extraction • Learns features and tasks directly from data • More Data = better model
https://www.youtube.com/watch?v=xr5LeWKbVnY 8 Deep Learning Uses a Neural Network Architecture
Input Output Layer Hidden Layers (n) Layer 9 Deep Learning Datatypes
Image Signal
Text Numeric
10 Reinforcement Learning vs Machine Learning vs Deep Learning
Machine Learning Reinforcement learning:
▪ Learning through trial & error [interaction] Reinforcement Unsupervised Supervised Learning Learning Learning [Labeled Data] [No Labeled Data] [Interaction Data] ▪ Complex problems typically need deep learning [Deep Reinforcement Learning] Decision Clustering Classification Regression Controls Making ▪ It’s about learning a behavior or Deep Learning accomplishing a task
▪ Examples: o Financial trading, calibration. o Lane-keep assist, adaptive cruise control, robotics, etc.
11 Reinforcement Learning vs Machine Learning vs Deep Learning
12 Reinforcement Learning vs Machine Learning vs Deep Learning
13 Reinforcement Learning vs Machine Learning vs Deep Learning
14 Reinforcement Learning vs Machine Learning vs Deep Learning
15 Agenda
Background: Reinforcement Learning vs Machine Learning vs Deep Learning
Deep Learning Workflows and Challenges
Reinforcement Learning (MATLAB + Simulink)
Conclusion
16 Deep Learning Challenges
Data Not a deep learning expert ▪ Handling large amounts of data
▪ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
▪ Accessing reference models from research
▪ Optimizing hyperparameters
▪ Training takes hours-days
Rapid and Optimized Deployment
▪ Desktop, web, cloud, and embedded hardware
17 Deep Learning Inference in 4 Lines of Code
• >> net = alexnet; • >> I = imread('peacock.jpg') • >> I1 = imresize(I,[227 227]); • >> classify(net,I1) • ans = • categorical • peacock
18 Labeling for deep learning is repetitive, tedious, and time-consuming…
but necessary
19 Deep Learning Made Easy with Apps
▪ Automate ground-truth labeling using Image Labeler app ▪ Deep Network Designer app
▪ Automate ground-truth labeling using Audio Labeler app ▪ Network Analyzer app
21 Accelerating Code: GPU Coder, Parallel Server, MATLAB Coder
▪ Generate CUDA code ▪ MATLAB Parallel Server – integrates with external CUDA code – Dynamic licensing
▪ Generate C/C++ code – C/C++ code is royalty-free: deploy to your customers at no charge – Package generated code as a
MEX-function for use in MATLAB 23 Deploy MATLAB Data Analytics into the Cloud
▪ Use algorithms developed in different versions of MATLAB ▪ Deploy encrypted MATLAB codes to protect IP
Web App MATLABMATLAB ProductionProduction Server Server REST call Enterprise Mobile Up to 24 workers in the pool data sources app Worker MATLAB 2015a runtime Manager Java App
Other
Start / stop Start stop / workers mgrs… MATLAB 2017a runtime .NET App
Request Broker Request Auto Scan MATLAB 2016b runtime C/C++ deploy App libraries Manager
HTTP(s) HTTP(s) over MATLAB 2015a runtime port port 9910/9920
Python Language specific client .ctf Legend App Hot deploy IT developed or deployed resources MathWorks components
24 App Designer Create Desktop and Web Apps in MATLAB
▪ Try this on your phone: https://deeplearning.mwlab.io/
25 MATLAB supports the Entire Deep Learning Workflow
ACCESS AND EXPLORE LABEL AND PREPROCESS DEVELOP PREDICTIVE INTEGRATE MODELS WITH DATA DATA MODELS SYSTEMS
Files Data Augmentation/ Hardware-Accelerated Desktop Apps Transformation Training
Databases Labeling Automation Hyperparameter Tuning Enterprise Scale Systems
Sensors Import Reference Network Visualization Embedded Devices and Models Hardware
26 Interoperability with Deep Learning Frameworks
▪ Import and export models using the Open Neural Network Exchange (ONNX) format ▪ Model importers (Caffe, TensorFlow-Keras)
▪ Access pretrained models with a single line of code – AlexNet, VGG-19, VGG-17, GoogLeNet, RestNet, …. 27 Agenda
Background: Reinforcement Learning vs Machine Learning vs Deep Learning
Deep Learning Workflows and Challenges
Reinforcement Learning (MATLAB + Simulink)
Conclusion
28 Glossary of Common Terms in Reinforcement Learning
▪ Agent: Red Circle that learns how to navigate the grid to reach the blue square by trial and error 5
▪ Environment: 5x5 grid that is being navigated 4
▪ State: The current square the red circle is in 3 +5
▪ Action: One of the 4 possible actions the red circle can take at each time step 2 4 Possible ▪ Reward: Points the red circle gets for taking an Actions action 1 +10 ▪ -1 for any move except 1 2 3 4 5 ▪ +5 when you land on teleportation square [4,4] ▪ +10 when you land on [5,1] Red circle does not know what possible reward values are 29 Glossary of Common Terms in Reinforcement Learning
▪ Trained Agent: Red Circle that has learned how to navigate the grid by taking the best 5 possible actions
4 ▪ Final Reward: ?+11 points
▪ Policy: The logic that is learned by red 3 +5 circle to implement the best possible actions. E.g. – If red circle is in [1,4], move right 2 4 – If red circle is in [2,4], move down Possible Actions 1 +10 ▪ Reinforcement Learning Algorithm: The trial- and-error algorithm that developed this 1 2 3 4 5 policy The best action to take depends on the state 30 In This Sample Trajectory, We Luckily Receive Two Rewards
5
4
3 +5
2
1 +10
1 2 3 4 5
31 And Now, the Agent Remembers Which Two Actions Led to the Reward
5
4
3 +5
2
1 +10
1 2 3 4 5
32 Eventually, We Find the Best Path Possible Based On Our Initial State
5
4
3 +5
2
1
1 2 3 4 5
33 But What If We Had a Different Initial State? Would the Same Path Be the Best Choice?
5
4
3 +5
2
1 +10
1 2 3 4 5
34 Clearly, the Best Action to Take Depends on the State We Are In In this case, we only have 21 possible states
5
In this case, we can run a 4 small and finite number of simulations to find the best possible path irrespective of 3 +5 our initial state
2
1 +10
1 2 3 4 5
https://www.mathworks.com/help/reinforcement-learning/ug/train-q-learning-agent-to-solve-basic-grid-world.html >> openExample('rl/BasicGridWorldExample’) 35 But What If We Have Many States?
36 Applications that Engineers and Scientists Care About Can Have Huge State Spaces
Robot Arm for Grasping Objects 6 Servo Motors – Assume 180 degrees range of motion
Possible states: 180 x 180 x 180 x 180 x 180 x 180 More than 3 trillion states (3.4x1013)
37 Deep Networks are commonly found in the agent, because they can model complex problems.
Current State AGENT Next Action (Image, Radar, Sensor, etc.) • Turn left • Turn right • Brake • Accelerate
By representing policies using deep neural networks, we can solve problems for complex, non-linear systems (continuous or discrete) by directly using data that traditional approaches cannot use easily 38 Teach a robot to follow a straight line using camera data
39 Let’s try to solve this problem the traditional way
Observations Camera Feature State Controller Data Extraction Estimation Motor Commands
Sensors
Motor Leg & Motor Commands Balance Trunk Control Trajectories
Observations
40 What is the alternative approach?
Observations Camera Feature State Controller Data Extraction Estimation Motor Commands
Sensors
Camera Data Black Box Motor Controller Commands Sensors
41 How Does Reinforcement Learning Work?
STATE ACTION AGENT
REWARD
ENVIRONMENT
42 A Practical Example of Reinforcement Learning Training a Self-Driving Car
▪ Vehicle’s computer learns how to drive… AGENT (agent) STATE ACTION ▪ using sensor readings from LIDAR, cameras,… Policy (state) Policy update ▪ that represent road conditions, vehicle position,… Reinforcement (environment) Learning ▪ by generating steering, braking, throttle commands,… Algorithm (action) ▪ based on an internal state-to-action mapping…
REWARD (policy) ▪ that tries to optimize driver comfort & fuel efficiency… (reward). ENVIRONMENT
The policy is updated through repeated trial-and-error by a reinforcement learning algorithm 43 Reinforcement Learning vs Controls Control system Reinforcement learning system
+ ERROR CONTROLLER PLANT REFERENCE MANIPULATED - VARIABLE
MEASUREMENT
Adaptation mechanism RL Algorithm Error/Cost function Reward Manipulated variable Action Measurement Observation Plant Environment Controller Policy Reinforcement learning has parallels to control system design
44 When would you use Reinforcement Learning?
Controller Computational Cost Computational Cost Capability in Training/Tuning in Deployment PID Low Low Low Model Pred Control High Low High Reinforcement Learning High High Medium
Reinforcement learning might be a good fit if ▪ An environment model is available (trial & error on hardware can be expensive), and ▪ Training/tuning time is not critical for the application, and ▪ Uncertain environments or nonlinear environments
45 Everyone is excited about it as it appears to be a silver bullet for all problems.
▪ However, there are challenges:
A lot of simulation trials required
Reward signal design, network layer structure & hyperparameter tuning can be challenging
Training may not converge
No performance guarantees
Further training might be necessary after deployment on real hardware
46 Simulation and Virtual Models are a Key Aspect of Reinforcement Learning
▪ Reinforcement learning needs a lot of data (sample inefficient) – Training on hardware can be prohibitively expensive and dangerous
▪ Virtual models allow you to simulate conditions hard to emulate in the real world – This can help develop a more robust solution
▪ Many of you have already developed MATLAB and Simulink models that can be reused 47 Why MATLAB and Simulink for Reinforcement Learning?
Virtual models allow you to simulate conditions hard to emulate in the real world.
48 Deep Learning Workflow
Data Preparation AI Modeling Deployment Multiplatform code Data access and Deep learning preprocessing generation (CPU, GPU) Model design, Importing Tuning training Reference Models options Edge deployment
Ground truth labeling Model exchange across frameworks Enterprise Deployment Hardware- accelerated training
49 Reinforcement Learning Workflow
Data Preparation AI Modeling Deployment Multiplatform code Scenario Modeling Reinforcement learning generation (CPU, GPU)
Simulation-based Training agent to data generation perform task Edge deployment
Developing reward system to optimize Enterprise performance Deployment Simulink – generate data for dynamic systems (planes, cars, robots, etc.)
50 Implement reinforcement learning agents to train policies
▪ Train agents using built-in and custom reinforcement learning algorithms ▪ Import deep neural network policies from Keras and the ONNX model format ▪ Train agents directly in Simulink models using the RL Agent block
MATLAB doc - Walking robot >> openExample('rl/RLWalkingBipedRobotExample') Webinar - Walking robot 51 Resources
▪ Reference Examples – Design controllers for robots, self-driving cars, and other systems
▪ Documentation written for engineers and domain experts
▪ Tech Talk video series on reinforcement learning concepts for engineers
52 Agenda
Background: Reinforcement Learning vs Machine Learning vs Deep Learning
Deep Learning Workflows and Challenges
Reinforcement Learning (MATLAB + Simulink)
Conclusion
53 Why MATLAB / Simulink for A.I. Tasks?
Increased productivity with interactive tools
Generate simulation data for complex models and systems
Ease of deployment and scaling to various platforms
Full A.I. workflows that cannot be easily replicated by other toolchains
54 MATLAB supports the ENTIRE workflow
▪ Gartner recognizes MathWorks as a Visionary in its January 2019 Magic Quadrant for Data Science and Machine Learning Platforms 55 MathWorks can help you do Deep Learning & RL
Getting Started More options
▪ Guided evaluations with a ▪ Consulting services MathWorks deep learning ▪ Training courses engineer ▪ Technical support ▪ Proof-of-concept projects ▪ Advanced customer support ▪ Deep learning hands-on ▪ Installation, enterprise, and cloud workshop (includes deployment Reinforcement learning) ▪ Deep Learning Paid Training ▪ Seminars and technical deep dives ▪ Deep learning onramp course
56 Before you leave, please complete our evaluation form. Your feedback is important to us! bit.ly/matlabexpoanz
57