Deep Learning and Reinforcement Learning Workflows in AI

Deep Learning and Reinforcement Learning Workflows in A.I. Emmanuel Blanchard © 2015 The MathWorks, Inc.1 A.I. with MATLAB and Simulink: Atlas Robot 2 Why should you care about Reinforcement Learning? ▪ It enables the use of deep learning for controls and decision-making applications Robotics Controls Autonomous driving Game Play 3 Why should you care about Reinforcement Learning? 4 What is Reinforcement Learning? ▪ What is Reinforcement Learning? – Type of machine learning that trains an ‘agent’ through repeated interactions with an environment ▪ How does it work? – Through a trial & error process that uses a reward system to maximize success 5 Agenda Background: Reinforcement Learning vs Machine Learning vs Deep Learning Deep Learning Workflows and Challenges Reinforcement Learning (MATLAB + Simulink) Conclusion 6 What is Machine Learning? 7 Machine Learning vs Deep Learning Machine Learning Unsupervised Supervised Learning Learning [Labeled Data] [No Labeled Data] Supervised learning typically involves feature extraction Clustering Classification Regression Deep Learning is subset of machine learning Deep Learning with automatic feature extraction • Learns features and tasks directly from data • More Data = better model https://www.youtube.com/watch?v=xr5LeWKbVnY 8 Deep Learning Uses a Neural Network Architecture Input Output Layer Hidden Layers (n) Layer 9 Deep Learning Datatypes Image Signal Text Numeric 10 Reinforcement Learning vs Machine Learning vs Deep Learning Machine Learning Reinforcement learning: ▪ Learning through trial & error [interaction] Reinforcement Unsupervised Supervised Learning Learning Learning [Labeled Data] [No Labeled Data] [Interaction Data] ▪ Complex problems typically need deep learning [Deep Reinforcement Learning] Decision Clustering Classification Regression Controls Making ▪ It’s about learning a behavior or Deep Learning accomplishing a task ▪ Examples: o Financial trading, calibration. o Lane-keep assist, adaptive cruise control, robotics, etc. 11 Reinforcement Learning vs Machine Learning vs Deep Learning 12 Reinforcement Learning vs Machine Learning vs Deep Learning 13 Reinforcement Learning vs Machine Learning vs Deep Learning 14 Reinforcement Learning vs Machine Learning vs Deep Learning 15 Agenda Background: Reinforcement Learning vs Machine Learning vs Deep Learning Deep Learning Workflows and Challenges Reinforcement Learning (MATLAB + Simulink) Conclusion 16 Deep Learning Challenges Data Not a deep learning expert ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Optimizing hyperparameters ▪ Training takes hours-days Rapid and Optimized Deployment ▪ Desktop, web, cloud, and embedded hardware 17 Deep Learning Inference in 4 Lines of Code • >> net = alexnet; • >> I = imread('peacock.jpg') • >> I1 = imresize(I,[227 227]); • >> classify(net,I1) • ans = • categorical • peacock 18 Labeling for deep learning is repetitive, tedious, and time-consuming… but necessary 19 Deep Learning Made Easy with Apps ▪ Automate ground-truth labeling using Image Labeler app ▪ Deep Network Designer app ▪ Automate ground-truth labeling using Audio Labeler app ▪ Network Analyzer app 21 Accelerating Code: GPU Coder, Parallel Server, MATLAB Coder ▪ Generate CUDA code ▪ MATLAB Parallel Server – integrates with external CUDA code – Dynamic licensing ▪ Generate C/C++ code – C/C++ code is royalty-free: deploy to your customers at no charge – Package generated code as a MEX-function for use in MATLAB 23 Deploy MATLAB Data Analytics into the Cloud ▪ Use algorithms developed in different versions of MATLAB ▪ Deploy encrypted MATLAB codes to protect IP Web App MATLABMATLAB ProductionProduction Server Server REST call Enterprise Mobile Up to 24 workers in the pool data sources app Worker MATLAB 2015a runtime Manager Java App Other Start / stop Start stop / workers mgrs… MATLAB 2017a runtime .NET App Request Broker Request Auto Scan MATLAB 2016b runtime C/C++ deploy App libraries Manager HTTP(s) HTTP(s) over MATLAB 2015a runtime port port 9910/9920 Python Language specific client .ctf Legend App Hot deploy IT developed or deployed resources MathWorks components 24 App Designer Create Desktop and Web Apps in MATLAB ▪ Try this on your phone: https://deeplearning.mwlab.io/ 25 MATLAB supports the Entire Deep Learning Workflow ACCESS AND EXPLORE LABEL AND PREPROCESS DEVELOP PREDICTIVE INTEGRATE MODELS WITH DATA DATA MODELS SYSTEMS Files Data Augmentation/ Hardware-Accelerated Desktop Apps Transformation Training Databases Labeling Automation Hyperparameter Tuning Enterprise Scale Systems Sensors Import Reference Network Visualization Embedded Devices and Models Hardware 26 Interoperability with Deep Learning Frameworks ▪ Import and export models using the Open Neural Network Exchange (ONNX) format ▪ Model importers (Caffe, TensorFlow-Keras) ▪ Access pretrained models with a single line of code – AlexNet, VGG-19, VGG-17, GoogLeNet, RestNet, …. 27 Agenda Background: Reinforcement Learning vs Machine Learning vs Deep Learning Deep Learning Workflows and Challenges Reinforcement Learning (MATLAB + Simulink) Conclusion 28 Glossary of Common Terms in Reinforcement Learning ▪ Agent: Red Circle that learns how to navigate the grid to reach the blue square by trial and error 5 ▪ Environment: 5x5 grid that is being navigated 4 ▪ State: The current square the red circle is in 3 +5 ▪ Action: One of the 4 possible actions the red circle can take at each time step 2 4 Possible ▪ Reward: Points the red circle gets for taking an Actions action 1 +10 ▪ -1 for any move except 1 2 3 4 5 ▪ +5 when you land on teleportation square [4,4] ▪ +10 when you land on [5,1] Red circle does not know what possible reward values are 29 Glossary of Common Terms in Reinforcement Learning ▪ Trained Agent: Red Circle that has learned how to navigate the grid by taking the best 5 possible actions 4 ▪ Final Reward: ?+11 points ▪ Policy: The logic that is learned by red 3 +5 circle to implement the best possible actions. E.g. – If red circle is in [1,4], move right 2 4 – If red circle is in [2,4], move down Possible Actions 1 +10 ▪ Reinforcement Learning Algorithm: The trial- and-error algorithm that developed this 1 2 3 4 5 policy The best action to take depends on the state 30 In This Sample Trajectory, We Luckily Receive Two Rewards 5 4 3 +5 2 1 +10 1 2 3 4 5 31 And Now, the Agent Remembers Which Two Actions Led to the Reward 5 4 3 +5 2 1 +10 1 2 3 4 5 32 Eventually, We Find the Best Path Possible Based On Our Initial State 5 4 3 +5 2 1 1 2 3 4 5 33 But What If We Had a Different Initial State? Would the Same Path Be the Best Choice? 5 4 3 +5 2 1 +10 1 2 3 4 5 34 Clearly, the Best Action to Take Depends on the State We Are In In this case, we only have 21 possible states 5 In this case, we can run a 4 small and finite number of simulations to find the best possible path irrespective of 3 +5 our initial state 2 1 +10 1 2 3 4 5 https://www.mathworks.com/help/reinforcement-learning/ug/train-q-learning-agent-to-solve-basic-grid-world.html >> openExample('rl/BasicGridWorldExample’) 35 But What If We Have Many States? 36 Applications that Engineers and Scientists Care About Can Have Huge State Spaces Robot Arm for Grasping Objects 6 Servo Motors – Assume 180 degrees range of motion Possible states: 180 x 180 x 180 x 180 x 180 x 180 More than 3 trillion states (3.4x1013) 37 Deep Networks are commonly found in the agent, because they can model complex problems. Current State AGENT Next Action (Image, Radar, Sensor, etc.) • Turn left • Turn right • Brake • Accelerate By representing policies using deep neural networks, we can solve problems for complex, non-linear systems (continuous or discrete) by directly using data that traditional approaches cannot use easily 38 Teach a robot to follow a straight line using camera data 39 Let’s try to solve this problem the traditional way Observations Camera Feature State Controller Data Extraction Estimation Motor Commands Sensors Motor Leg & Motor Commands Balance Trunk Control Trajectories Observations 40 What is the alternative approach? Observations Camera Feature State Controller Data Extraction Estimation Motor Commands Sensors Camera Data Black Box Motor Controller Commands Sensors 41 How Does Reinforcement Learning Work? STATE ACTION AGENT REWARD ENVIRONMENT 42 A Practical Example of Reinforcement Learning Training a Self-Driving Car ▪ Vehicle’s computer learns how to drive… AGENT (agent) STATE ACTION ▪ using sensor readings from LIDAR, cameras,… Policy (state) Policy update ▪ that represent road conditions, vehicle position,… Reinforcement (environment) Learning ▪ by generating steering, braking, throttle commands,… Algorithm (action) ▪ based on an internal state-to-action mapping… REWARD (policy) ▪ that tries to optimize driver comfort & fuel efficiency… (reward). ENVIRONMENT The policy is updated through repeated trial-and-error by a reinforcement learning algorithm 43 Reinforcement Learning vs Controls Control system Reinforcement learning system + ERROR CONTROLLER PLANT REFERENCE MANIPULATED - VARIABLE MEASUREMENT Adaptation mechanism RL Algorithm Error/Cost function Reward Manipulated variable Action Measurement Observation Plant Environment Controller Policy Reinforcement learning has parallels to control system design 44 When would you use Reinforcement Learning? Controller Computational Cost Computational Cost Capability in Training/Tuning in Deployment PID Low Low Low Model Pred Control High Low

Load more