Deep Reinforcement Learning Lex Fridman

Deep Reinforcement Learning Lex Fridman Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Open Question: Sensors What can be learned from Sensor Data data? Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge GPS Reasoning Camera Lidar Radar (Visible, Infrared) Planning Action Networking IMU Effector Stereo Camera Microphone (Wired, Wireless) Course 6.S191: Lex Fridman: January References: [132] Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Image Recognition: Audio Recognition: If it looks like a duck Quacks like a duck Sensors Sensor Data Feature Extraction Representation Activity Recognition: Swims like a duck Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Final breakthrough, 358 years after its conjecture: Knowledge “It was so indescribably beautiful; it was so simple and so elegant. I couldn’t understand how I’d missed it and Reasoning I just stared at it in disbelief for twenty minutes. Then during the day I walked around the department, and Planning I’d keep coming back to my desk looking to see if it was still there. It was still there. I couldn’t contain Action myself, I was so excited. It was the most important moment of my working life. Nothing I ever do again Effector will mean as much." Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January References: [133] Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data The promise of Feature Extraction Deep Learning Representation Machine Learning Knowledge Reasoning The promise of Planning Deep Reinforcement Learning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Types of Deep Learning Supervised Semi-Supervised Unsupervised Learning Learning Reinforcement Learning Learning For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [81, 165] Intro to Deep Learning [email protected] 2018 Philosophical Motivation for Reinforcement Learning Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”. For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Agent and Environment • At each step the agent: • Executes action • Receives observation (new state) • Receives reward • The environment: • Receives action • Emits observation (new state) • Emits reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [80] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Reinforcement learning is a general-purpose framework for decision-making: • An agent operates in an environment: Atari Breakout • An agent has the capacity to act • Each action influences the agent’s future state • Success is measured by a reward signal • Goal is to select actions to maximize future reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [85] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Cart-Pole Balancing • Goal— Balance the pole on top of a moving cart • State—Pole angle, angular speed. Cart position, horizontal velocity. • Actions— horizontal force to the cart • Reward— 1 at each time step if the pole is upright For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Doom • Goal— Eliminate all opponents • State— Raw game pixels of the game • Actions — Up, Down, Left, Right etc • Reward— Positive when eliminating an opponent, negative when the agent is eliminated For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Bin Packing • Goal - Pick a device from a box and put it into a container • State - Raw pixels of the real world • Actions - Possible actions of the robot • Reward - Positive when placing a device successfully, negative otherwise For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Human Life • Goal - Survival? Happiness? • State - Sight. Hearing. Taste. Smell. Touch. • Actions - Think. Move. • Reward – Homeostasis? For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Key Takeaways for Real-World Impact • Deep Learning: • Fun part: Good algorithms that learn from data. • Hard part: Huge amounts of representative data. • Deep Reinforcement Learning: • Fun part: Good algorithms that learn from data. • Hard part: Defining a useful state space, action space, and reward. • Hardest part: Getting meaningful data for the above formalization. For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Markov Decision Process 푠0, 푎0, 푟1, 푠1, 푎1, 푟2, … , 푠푛−1, 푎푛−1, 푟푛, 푠푛 state Terminal state action reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [84] Intro to Deep Learning [email protected] 2018 Major Components of an RL Agent An RL agent may include one or more of these components: • Policy: agent’s behavior function • Value function: how good is each state and/or action • Model: agent’s representation of the environment 푠0, 푎0, 푟1, 푠1, 푎1, 푟2, … , 푠푛−1, 푎푛−1, 푟푛, 푠푛 state Terminal state action reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Robot in a Room actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT START 10% move RIGHT • reward +1 at [4,3], -1 at [4,2] • reward -0.04 for each step • what’s the strategy to achieve max reward? • what if the actions were deterministic? Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Is this a solution? actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT 10% move RIGHT • only if actions deterministic • not in this case (actions are stochastic) • solution/policy • mapping from each state to an action Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Optimal policy actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT 10% move RIGHT Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step -2 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.1 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.04 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.01 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: +0.01 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Value Function • Future reward 푅 = 푟1 + 푟2 + 푟3 + ⋯ + 푟푛 푅푡 = 푟푡 + 푟푡 +1 + 푟푡 +2 + ⋯ + 푟푛 • Discounted future reward (environment is stochastic) 2 푛−푡 푅푡 = 푟푡 + 훾푟푡+1 + 훾 푟푡+2 + ⋯ + 훾 푟푛 = 푟푡 + 훾(푟푡+1 + 훾(푟푡+2 + ⋯)) = 푟푡 + 훾푅푡+1 • A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward Course 6.S191: Lex Fridman: January References: [84] Intro to Deep Learning [email protected] 2018 Q-Learning s a • State-action value function: Q (s,a) r • Expected return when starting in s, performing a, and following s’ • Q-Learning: Use any policy to estimate Q that maximizes future reward:

Deep Reinforcement Learning Lex Fridman

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support