Application of Deep Learning & Reinforcement Learning in Control Systems

Application Of Deep Learning & Reinforcement Learning In Control Systems Hassan Teimoori, PhD. May 2019 Agenda • Analytics ecosystem • Review on Deep learning and reinforcement learning • Applications in control systems Analytics ecosystem Artificial intelligence (AI) is no longer on the horizon. It’s here now, and it’s already having a profound impact on how we live, work, and do business. Artificial intelligence Machine learning Perspective analytics (Action) • What do we need to do? • Identify measures to improve the outcome • Automation Deep learning • Optimization Predictive analytics (Decision) • What is likely to be happening? • Predict the patterns and the near future events Diagnostic analytics (Insights) • Focusing on why is it happening, Examine and find out about the root cause • Confounding information isolation Descriptive analytics (Information) • Focusing on what happen, Comprehensive, accurate, effective visualization • Capture product conditions, environments & operations Enablers (Knowledge) • Big data: connected products, historical data, enterprise data, external data • Processing power: Standalone (CPU, GPU, TPU), distributed, cloud, ambient computing (IoT) • Robotics • New Algorithms Control systems Deep, broad and strong base of foundational knowledge with major emphasis on decision making under uncertainty. Dynamic Structural Model systems Identification Stability Feedback properties reduction modeling Fault Optimality Robustness Adaptation Architecture tolerance • Variety of settings Linear Nonlinear Stochastic Hybrid Distributed Supervisory • Open challenges: control of large, complex, distributed dynamical systems under rapid changes in the environment and high levels of uncertainty. Control systems The current approaches for control are either classic control approach or optimization based approach. Classic control Optimization based controllers • Less biased to artificial intelligent decisions • Look ahead in the future and take action considering future errors • Requires extensive knowledge from an expert with relevant domain knowledge • Suffer from the fact that the optimization step takes time to return optimal control • Transfer the knowledge to the controller via input, especially for complex high control law and other mathematical dimensional systems. derivation • Challenge: • Challenge: • Hard to handle uncertainty about the true • Careful analysis of the process dynamics system dynamics and noise in observation data. • Requires a model of the process either derived from first principles or empirical. • Burden on prediction of hidden states • Model maintenance is very difficult or • Requires an accurate model of the system rarely achieved. to start with Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms. Deep learning review (1 of 3) Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. Machine Feature ML Input Feature Output learning extractor algorithm Deep learning Input Deep learning Output Neuron Each neuron • receives input from many other neurons, • changes its internal state (activation) based on the current input, • sends one output signal to many other neurons Deep learning review (2 of 3) Deep learning architectures, algorithms and techniques have created powerful tools to learn representations of large volumes of data in multiple layers of representation. Criteria: • Quantity and form of input data Entering Apply the training data • How must the weights be modified to allow fast and correction (neuron activations) reliable learning? • Success measure ? • Number of iterations (Epochs) • Stability? Correction of the • Order of pattern representation? network is calculated Forward propagation • When do we stop learning? (Back-propagate) • Etc. Output Considerations Error > threshold? assessment and error • Large state space calculation • Subject matter experty Error < threshold ? • Data fidelity • Regulations • Cost of failure • Technology limits Pattern recognition Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. Best use cases: • For modeling a highly nonlinear system • When the model is supposed to get constantly updated • When the model interpretability is not a key concern. Memory based architecture Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. 푠푡−1 푠푡 푥 LSTM 푡 푦푡 Vanilla Image Sentiment Machine Video Neural Captioning classification Translation processing Networks Deep learning generic pattern Accomplished through computation over dataflow graphs. Provide interfaces that make it simple for developers to construct computation graphs and runtimes that process the graphs in an optimized way. The graph is conducive for optimization and translation to run on specific devices (CPU, GPU, TPU, FPGA, etc.). Deep neural network frameworks • Caffe/ Caffe2 • CNTK • DL4j • Keras • Lasagne • mxnet • PaddlePaddle • TensorFlow • Theano • Torch/Pytorch • … Transfer learning Transfer learning is about to begin with a previously trained model that had good results and then train it further on a specific image dataset. • Approaches • Fine-tuning an existing model • Use an existing convolutional model as a feature extractor • When to consider trying transfer learning • Training dataset is small • Training dataset shares visual features with the base dataset Transfer learning for Robotics Learning from a simulation and transferring the knowledge to real-world robot alleviates the slowness and expensive training process A. A. Rusu et al. 2018 • Boost initial performance • Increased learning speed • Learning more accurate performance Transfer learning Inspired by the human brain, a neural network consists of highly connected networks of neurons that relate the inputs to the desired outputs. Size of Data set Train model from scratch The predictions made using pretrained models would not be Fine train the pretrained model effective. Large Ideal case. Hence, its best to train the neural Retain the architecture of the model network from scratch according to and the initial weights of the model. your data. Fine tune the lower layers Fine tune the output layers We can freeze the initial (let’s say k) Customize and modify the output layers layers of the pretrained model and according to our problem train just the remaining(n-k) layers Use the pretrained model as a feature extractor again. Small Example: imagenet (cat & dog) with 1000 class The small size of the data set is is simplified to two classes compensated by the fact that the initial layers are kept pretrained Low High Data similarity Reinforcement learning (RL) RL is the subfield of machine learning that studies how to use past data to enhance the future manipulation of a dynamical system. Agent: This we create by programming such that it is able to sense the environment, perform actions, receive feedback, and try to maximize rewards. Environment: The world where the agent resides. It can be real or simulated. State: The perception or configuration of the environment that the agent senses. State spaces can be finite or infinite. Rewards: Feedback the agent receives after any action it has taken. The goal of the agent is to maximize the overall reward, that is, the immediate and the future reward. Actions: Anything that the agent is capable of doing in the given environment. Action space can be finite or infinite. Episode: Represents one complete run of the whole task. Reinforcement learning (RL) RL is the subfield of machine learning that studies how to use past data to enhance the future manipulation of a dynamical system. Observations Constraints • The agent learns from its own experience. • The outcome of actions may be uncertain • The reward may be delayed and/or stochastic. • The actions change the status. • No clear model about how the world responds to • The effect of an action cannot be completely predicted. actions. • The environment may change while trying to • Choices improve with experience. learn it • Problems can have a finite or infinite time horizon. RL algorithm The agent contains two components: a policy and a learning algorithm. Formulate Problem Create Environment Define Reward •Define the task for the •Define the environment •Specify the reward signal agent in terms of interaction within which the agent that the agent uses to with the environment and operates, including the measure its performance goals the agent must interfaces and the against the task goals. achieve environment dynamic model Create Agent Train Agent Validate Agent •Create the agent, which •Train the agent policy •Evaluate the performance of includes defining a policy representation using the the trained agent by representation and defined environment, simulating the agent and configuring the agent reward, and agent learning environment together. learning algorithm. algorithm. Deploy Policy •Deploy the trained policy representation using, for example, generated GPU code. Open AI Gym example • Open AI Gym is a simple 5 state environment. • There are two possible actions in each state, move forward (action 0) and move backwards (action 1). • There is also a random chance that

Load more