
Deep Reinforcement Learning Lex Fridman Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Open Question: Sensors What can be learned from Sensor Data data? Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge GPS Reasoning Camera Lidar Radar (Visible, Infrared) Planning Action Networking IMU Effector Stereo Camera Microphone (Wired, Wireless) Course 6.S191: Lex Fridman: January References: [132] Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Image Recognition: Audio Recognition: If it looks like a duck Quacks like a duck Sensors Sensor Data Feature Extraction Representation Activity Recognition: Swims like a duck Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Final breakthrough, 358 years after its conjecture: Knowledge “It was so indescribably beautiful; it was so simple and so elegant. I couldn’t understand how I’d missed it and Reasoning I just stared at it in disbelief for twenty minutes. Then during the day I walked around the department, and Planning I’d keep coming back to my desk looking to see if it was still there. It was still there. I couldn’t contain Action myself, I was so excited. It was the most important moment of my working life. Nothing I ever do again Effector will mean as much." Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data Feature Extraction Representation Machine Learning Knowledge Reasoning Planning Action Effector Course 6.S191: Lex Fridman: January References: [133] Intro to Deep Learning [email protected] 2018 Environment Sensors Sensor Data The promise of Feature Extraction Deep Learning Representation Machine Learning Knowledge Reasoning The promise of Planning Deep Reinforcement Learning Action Effector Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Types of Deep Learning Supervised Semi-Supervised Unsupervised Learning Learning Reinforcement Learning Learning For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [81, 165] Intro to Deep Learning [email protected] 2018 Philosophical Motivation for Reinforcement Learning Takeaway from Supervised Learning: Neural networks are great at memorization and not (yet) great at reasoning. Hope for Reinforcement Learning: Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”. For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Agent and Environment • At each step the agent: • Executes action • Receives observation (new state) • Receives reward • The environment: • Receives action • Emits observation (new state) • Emits reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [80] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Reinforcement learning is a general-purpose framework for decision-making: • An agent operates in an environment: Atari Breakout • An agent has the capacity to act • Each action influences the agent’s future state • Success is measured by a reward signal • Goal is to select actions to maximize future reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [85] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Cart-Pole Balancing • Goal— Balance the pole on top of a moving cart • State—Pole angle, angular speed. Cart position, horizontal velocity. • Actions— horizontal force to the cart • Reward— 1 at each time step if the pole is upright For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Doom • Goal— Eliminate all opponents • State— Raw game pixels of the game • Actions — Up, Down, Left, Right etc • Reward— Positive when eliminating an opponent, negative when the agent is eliminated For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Bin Packing • Goal - Pick a device from a box and put it into a container • State - Raw pixels of the real world • Actions - Possible actions of the robot • Reward - Positive when placing a device successfully, negative otherwise For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [166] Intro to Deep Learning [email protected] 2018 Examples of Reinforcement Learning Human Life • Goal - Survival? Happiness? • State - Sight. Hearing. Taste. Smell. Touch. • Actions - Think. Move. • Reward – Homeostasis? For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Key Takeaways for Real-World Impact • Deep Learning: • Fun part: Good algorithms that learn from data. • Hard part: Huge amounts of representative data. • Deep Reinforcement Learning: • Fun part: Good algorithms that learn from data. • Hard part: Defining a useful state space, action space, and reward. • Hardest part: Getting meaningful data for the above formalization. For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Markov Decision Process 푠0, 푎0, 푟1, 푠1, 푎1, 푟2, … , 푠푛−1, 푎푛−1, 푟푛, 푠푛 state Terminal state action reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references [84] Intro to Deep Learning [email protected] 2018 Major Components of an RL Agent An RL agent may include one or more of these components: • Policy: agent’s behavior function • Value function: how good is each state and/or action • Model: agent’s representation of the environment 푠0, 푎0, 푟1, 푠1, 푎1, 푟2, … , 푠푛−1, 푎푛−1, 푟푛, 푠푛 state Terminal state action reward For the full updated list of references visit: Course 6.S191: Lex Fridman: January https://selfdrivingcars.mit.edu/references Intro to Deep Learning [email protected] 2018 Robot in a Room actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT START 10% move RIGHT • reward +1 at [4,3], -1 at [4,2] • reward -0.04 for each step • what’s the strategy to achieve max reward? • what if the actions were deterministic? Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Is this a solution? actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT 10% move RIGHT • only if actions deterministic • not in this case (actions are stochastic) • solution/policy • mapping from each state to an action Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Optimal policy actions: UP, DOWN, LEFT, RIGHT +1 When actions are stochastic: -1 UP 80% move UP 10% move LEFT 10% move RIGHT Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step -2 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.1 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.04 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: -0.01 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Reward for each step: +0.01 +1 -1 Course 6.S191: Lex Fridman: January Intro to Deep Learning [email protected] 2018 Value Function • Future reward 푅 = 푟1 + 푟2 + 푟3 + ⋯ + 푟푛 푅푡 = 푟푡 + 푟푡 +1 + 푟푡 +2 + ⋯ + 푟푛 • Discounted future reward (environment is stochastic) 2 푛−푡 푅푡 = 푟푡 + 훾푟푡+1 + 훾 푟푡+2 + ⋯ + 훾 푟푛 = 푟푡 + 훾(푟푡+1 + 훾(푟푡+2 + ⋯)) = 푟푡 + 훾푅푡+1 • A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward Course 6.S191: Lex Fridman: January References: [84] Intro to Deep Learning [email protected] 2018 Q-Learning s a • State-action value function: Q (s,a) r • Expected return when starting in s, performing a, and following s’ • Q-Learning: Use any policy to estimate Q that maximizes future reward:
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages58 Page
-
File Size-