Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations
Total Page:16
File Type:pdf, Size:1020Kb
Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations Devin Schwab CMU-RI-TR-20-46 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Robotics The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 August 19, 2020 Thesis Committee: Manuela Veloso, Chair Katerina Fragkiadaki David Held Martin Riedmiller, Google DeepMind Copyright © 2020 Devin Schwab Keywords: robotics, reinforcement learning, auxiliary task, multi-task learning, transfer learning, deep learning To my friends and family. Abstract A long standing goal of robotics research is to create algorithms that can au- tomatically learn complex control strategies from scratch. Part of the challenge of applying such algorithms to robots is the choice of representation. Reinforcement Learning (RL) algorithms have been successfully applied to many different robotic tasks such as the Ball-in-a-Cup task with a robot arm and various RoboCup robot soccer inspired domains. However, RL algorithms still suffer from issues of large training time and large amounts of required training data. Choosing appropriate rep- resentations for the state space, action space and policy can go a long way towards reducing the required training time and required training data. This thesis focuses on robot deep reinforcement learning. Specifically, how choices of representation for state spaces, action spaces, and policies can reduce training time and sample complexity for robot learning tasks. In particular the focus is on two main areas: 1. Transferrable Representations via Tensor State-Action Spaces 2. Auxiliary Task Learning with Multiple State Representations The first area explores methods for improving transfer of robot policies across environment changes. Learning a policy can be expensive, but if the policy can be transferred and reused across similar environments, the training costs can be amor- tized. Transfer learning is a well-studied area with multiple techniques. In this thesis we focus on designing a representation that makes for easy transfer. Our method maps state-spaces and action spaces to multi-dimensional tensors designed to remain a fixed dimension as the number of robots and other objects in an envi- ronment varies. We also present the Fully Convolutional Q-Network (FCQN) policy representation, a specialized network architecture that combined with the tensor rep- resentation allows for zero-shot transfer across environment sizes. We demonstrate such an approach on simulated single and multi-agent tasks inspired by RoboCup Small Size League (SSL) and a modified version of Atari Breakout. We also show that it is possible to use such a representation and simulation trained policies with real-world sensor data and robots. The second area examines how strengths in one robot Deep RL state represen- tation can make-up for weaknesses in another. For example, we would often like to learn tasks using the robot’s available sensors, which include high-dimensional sensors such as cameras. Recent Deep RL algorithms can learn with images, but the amount of data can be prohibitive for real robots. Alternatively, one can create a state using a minimal set of features necessary for task completion. This has the advantages of 1) reducing the number of policy parameters and 2) removing irrele- vant information. However, extracting these features often has a significant cost in terms of engineering, additional hardware, calibration and fragility outside the lab. We demonstrate this on multiple robot platforms and tasks in both simulation and the real-world. We show that it works on simulated RoboCup Small Size League (SSL) robots. We also demonstrate that such techniques allow for from scratch learning on real hardware via the Ball-in-a-Cup task performed by a robot arm. Acknowledgments I would like to thank my family. From an early age they instilled in me the value of a good education. I would never have pursued, let alone finished a PhD, without their upbringing and continued support throughout my life. My husband, Tommy Hu, has also played an important role in my success. He has been a rock who has helped me get through difficult patches. Without his in- sights, love, and support I would not have been able to persevere and complete this thesis. My advisor, Manuela Veloso, has also played a large role in my thesis. I could not have completed my thesis without the work with the RoboCup Small Size League soccer robots she helped to start. I would also like to thank her for giving me the freedom throughout my PhD to work on what interested me most. I want to thank Martin Riedmiller, a member of my committee and my supervisor during the work I did at DeepMind. Without his support and the idea to work on the Ball-in-a-Cup task, a large portion of this thesis might not have happened. I would also like to thank the others I worked with at DeepMind both on the controls team and in the robotics lab. It was incredibly helpful bouncing ideas off of them and learning from their experiences. I also would like to thank the other two members of my committee: David Held and Katerina Fragkiadaki. I enjoyed my research talks with David in the last few years of the PhD. His insights were extremely valuable. Katerina gave me the op- portunity to TA for the new Reinforcement Learning class she was co-teaching. That was an amazing experience for me. I thoroughly enjoyed planning the curriculum and assignments with her and getting to introduce others at CMU to the field of Reinforcement Learning. I would like to thank my master degree advisor, Soumya Ray. Without him I would not have been involved in Reinforcement Learning at all. His undergraduate AI class caught my imagination and focused my ambition towards AI and Reinforce- ment Learning. He then guided me through my masters degree and my first major publication. His support was invaluable when applying to PhD programs. Without his recommendation letters and advice crafting the applications, I doubt I would have been able to complete a CMU PhD. I would also like to thank The University of Hong Kong and my colleagues that worked on the DARPA Robotics Challenge. It was an extremely useful experience for me, giving me my first taste of real robotics research. It was also incredibly rewarding to get to personally work with the Atlas robot. Finally, I would like to thank my CORAL lab mates and other RI students, both past and present. Those who came before me in the CORAL lab and the RI PhD program had invaluable advice about how to navigate a PhD. And those who went through the process with me were always quick to help when I was stuck on a prob- lem or just needed some support. I want to give special thanks to past members of the CMDragons SSL team, my thesis was built on the shoulders of their previous work. Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Thesis Question . .2 1.3 Approach . .3 1.3.1 Transferrable Representations . .3 1.3.2 Auxiliary Task Learning with Multiple State Representations . .4 1.4 Robots and Tasks . .4 1.5 Simulation vs. Reality . .5 1.6 Contributions . .6 1.7 Reading Guide to the Thesis . .6 2 Assessing Shortcomings of Current Reinforcement Learning for Robotics 8 2.1 Shortcomings of Existing Deep RL Approaches . .8 2.1.1 Sample Complexity and Training Time . .8 2.1.2 Adaptability . .9 2.1.3 Safety . .9 2.2 Applying RL to RoboCup Domains . 10 2.2.1 RoboCup Description . 10 2.2.2 SSL Tasks . 11 2.2.3 Problem Description . 13 2.2.4 Deep Reinforcement Learning of Soccer Skills . 13 2.2.5 Environment Setup . 14 2.2.6 Empirical Results . 18 2.2.7 Discussion of Shortcomings in the RoboCup Domain . 23 2.2.8 Adaptability of Policies . 24 2.2.9 Safety while Training and Executing . 24 2.3 How this Thesis Addresses the Shortcomings . 25 2.3.1 Improving Policy Adaptability . 25 2.3.2 Reducing Required Training Time and Data . 25 2.3.3 Safety . 25 3 Tensor State-Action Spaces as Transferrable Representations 27 3.1 Problem Description . 27 3.2 Tensor Representations . 28 viii 3.2.1 Background and Notation . 28 3.2.2 Tensor-based States and Actions . 29 3.2.3 Optimality of Policies with Tensor Representation . 30 3.3 Position Based Mappings . 31 3.3.1 Environments . 31 3.3.2 Position based φ and Mappings . 31 3.4 Policy Learning with Tensor Representations . 32 3.4.1 Single Agent Tensor State-Action Training Algorithm . 32 3.4.2 Masking Channels . 33 3.4.3 Fully Convolutional Q-Network (FCQN) . 34 3.5 Multi-agent Policies . 35 3.5.1 Multi-Agent Tensor State-Action Training Algorithm . 36 3.6 Simulated Empirical Results . 37 3.6.1 Environment Descriptions . 37 3.6.2 Hyper-Parameters and Network Architecture . 38 3.6.3 Learning Performance . 39 3.6.4 Zero-shot Transfer Across Team Sizes . 43 3.6.5 Performance of Transfer vs Trained from Scratch . 48 3.6.6 Zero–shot Transfer Across Environment Sizes . 48 3.7 Applications to Real Hardware . 49 3.7.1 Real-world State Action Mappings . 49 3.7.2 Empirical Results . 50 3.8 Summary . 51 4 Auxiliary Task Learning with Multiple State Representations 53 4.1 Problem Description . 53 4.2 Training and Network Approach . 55 4.2.1 Background and Notation . 55 4.2.2 SAC-X with Different State Spaces . 56 4.2.3 Policy Evaluation . 57 4.2.4 Policy Improvement . 58 4.2.5 Asymmetric Actor-Critic with Different State Spaces .