Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations

Total Page:16

File Type:pdf, Size:1020Kb

Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations Robot Deep Reinforcement Learning: Tensor State-Action Spaces and Auxiliary Task Learning with Multiple State Representations Devin Schwab CMU-RI-TR-20-46 Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Robotics The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 August 19, 2020 Thesis Committee: Manuela Veloso, Chair Katerina Fragkiadaki David Held Martin Riedmiller, Google DeepMind Copyright © 2020 Devin Schwab Keywords: robotics, reinforcement learning, auxiliary task, multi-task learning, transfer learning, deep learning To my friends and family. Abstract A long standing goal of robotics research is to create algorithms that can au- tomatically learn complex control strategies from scratch. Part of the challenge of applying such algorithms to robots is the choice of representation. Reinforcement Learning (RL) algorithms have been successfully applied to many different robotic tasks such as the Ball-in-a-Cup task with a robot arm and various RoboCup robot soccer inspired domains. However, RL algorithms still suffer from issues of large training time and large amounts of required training data. Choosing appropriate rep- resentations for the state space, action space and policy can go a long way towards reducing the required training time and required training data. This thesis focuses on robot deep reinforcement learning. Specifically, how choices of representation for state spaces, action spaces, and policies can reduce training time and sample complexity for robot learning tasks. In particular the focus is on two main areas: 1. Transferrable Representations via Tensor State-Action Spaces 2. Auxiliary Task Learning with Multiple State Representations The first area explores methods for improving transfer of robot policies across environment changes. Learning a policy can be expensive, but if the policy can be transferred and reused across similar environments, the training costs can be amor- tized. Transfer learning is a well-studied area with multiple techniques. In this thesis we focus on designing a representation that makes for easy transfer. Our method maps state-spaces and action spaces to multi-dimensional tensors designed to remain a fixed dimension as the number of robots and other objects in an envi- ronment varies. We also present the Fully Convolutional Q-Network (FCQN) policy representation, a specialized network architecture that combined with the tensor rep- resentation allows for zero-shot transfer across environment sizes. We demonstrate such an approach on simulated single and multi-agent tasks inspired by RoboCup Small Size League (SSL) and a modified version of Atari Breakout. We also show that it is possible to use such a representation and simulation trained policies with real-world sensor data and robots. The second area examines how strengths in one robot Deep RL state represen- tation can make-up for weaknesses in another. For example, we would often like to learn tasks using the robot’s available sensors, which include high-dimensional sensors such as cameras. Recent Deep RL algorithms can learn with images, but the amount of data can be prohibitive for real robots. Alternatively, one can create a state using a minimal set of features necessary for task completion. This has the advantages of 1) reducing the number of policy parameters and 2) removing irrele- vant information. However, extracting these features often has a significant cost in terms of engineering, additional hardware, calibration and fragility outside the lab. We demonstrate this on multiple robot platforms and tasks in both simulation and the real-world. We show that it works on simulated RoboCup Small Size League (SSL) robots. We also demonstrate that such techniques allow for from scratch learning on real hardware via the Ball-in-a-Cup task performed by a robot arm. Acknowledgments I would like to thank my family. From an early age they instilled in me the value of a good education. I would never have pursued, let alone finished a PhD, without their upbringing and continued support throughout my life. My husband, Tommy Hu, has also played an important role in my success. He has been a rock who has helped me get through difficult patches. Without his in- sights, love, and support I would not have been able to persevere and complete this thesis. My advisor, Manuela Veloso, has also played a large role in my thesis. I could not have completed my thesis without the work with the RoboCup Small Size League soccer robots she helped to start. I would also like to thank her for giving me the freedom throughout my PhD to work on what interested me most. I want to thank Martin Riedmiller, a member of my committee and my supervisor during the work I did at DeepMind. Without his support and the idea to work on the Ball-in-a-Cup task, a large portion of this thesis might not have happened. I would also like to thank the others I worked with at DeepMind both on the controls team and in the robotics lab. It was incredibly helpful bouncing ideas off of them and learning from their experiences. I also would like to thank the other two members of my committee: David Held and Katerina Fragkiadaki. I enjoyed my research talks with David in the last few years of the PhD. His insights were extremely valuable. Katerina gave me the op- portunity to TA for the new Reinforcement Learning class she was co-teaching. That was an amazing experience for me. I thoroughly enjoyed planning the curriculum and assignments with her and getting to introduce others at CMU to the field of Reinforcement Learning. I would like to thank my master degree advisor, Soumya Ray. Without him I would not have been involved in Reinforcement Learning at all. His undergraduate AI class caught my imagination and focused my ambition towards AI and Reinforce- ment Learning. He then guided me through my masters degree and my first major publication. His support was invaluable when applying to PhD programs. Without his recommendation letters and advice crafting the applications, I doubt I would have been able to complete a CMU PhD. I would also like to thank The University of Hong Kong and my colleagues that worked on the DARPA Robotics Challenge. It was an extremely useful experience for me, giving me my first taste of real robotics research. It was also incredibly rewarding to get to personally work with the Atlas robot. Finally, I would like to thank my CORAL lab mates and other RI students, both past and present. Those who came before me in the CORAL lab and the RI PhD program had invaluable advice about how to navigate a PhD. And those who went through the process with me were always quick to help when I was stuck on a prob- lem or just needed some support. I want to give special thanks to past members of the CMDragons SSL team, my thesis was built on the shoulders of their previous work. Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Thesis Question . .2 1.3 Approach . .3 1.3.1 Transferrable Representations . .3 1.3.2 Auxiliary Task Learning with Multiple State Representations . .4 1.4 Robots and Tasks . .4 1.5 Simulation vs. Reality . .5 1.6 Contributions . .6 1.7 Reading Guide to the Thesis . .6 2 Assessing Shortcomings of Current Reinforcement Learning for Robotics 8 2.1 Shortcomings of Existing Deep RL Approaches . .8 2.1.1 Sample Complexity and Training Time . .8 2.1.2 Adaptability . .9 2.1.3 Safety . .9 2.2 Applying RL to RoboCup Domains . 10 2.2.1 RoboCup Description . 10 2.2.2 SSL Tasks . 11 2.2.3 Problem Description . 13 2.2.4 Deep Reinforcement Learning of Soccer Skills . 13 2.2.5 Environment Setup . 14 2.2.6 Empirical Results . 18 2.2.7 Discussion of Shortcomings in the RoboCup Domain . 23 2.2.8 Adaptability of Policies . 24 2.2.9 Safety while Training and Executing . 24 2.3 How this Thesis Addresses the Shortcomings . 25 2.3.1 Improving Policy Adaptability . 25 2.3.2 Reducing Required Training Time and Data . 25 2.3.3 Safety . 25 3 Tensor State-Action Spaces as Transferrable Representations 27 3.1 Problem Description . 27 3.2 Tensor Representations . 28 viii 3.2.1 Background and Notation . 28 3.2.2 Tensor-based States and Actions . 29 3.2.3 Optimality of Policies with Tensor Representation . 30 3.3 Position Based Mappings . 31 3.3.1 Environments . 31 3.3.2 Position based φ and Mappings . 31 3.4 Policy Learning with Tensor Representations . 32 3.4.1 Single Agent Tensor State-Action Training Algorithm . 32 3.4.2 Masking Channels . 33 3.4.3 Fully Convolutional Q-Network (FCQN) . 34 3.5 Multi-agent Policies . 35 3.5.1 Multi-Agent Tensor State-Action Training Algorithm . 36 3.6 Simulated Empirical Results . 37 3.6.1 Environment Descriptions . 37 3.6.2 Hyper-Parameters and Network Architecture . 38 3.6.3 Learning Performance . 39 3.6.4 Zero-shot Transfer Across Team Sizes . 43 3.6.5 Performance of Transfer vs Trained from Scratch . 48 3.6.6 Zero–shot Transfer Across Environment Sizes . 48 3.7 Applications to Real Hardware . 49 3.7.1 Real-world State Action Mappings . 49 3.7.2 Empirical Results . 50 3.8 Summary . 51 4 Auxiliary Task Learning with Multiple State Representations 53 4.1 Problem Description . 53 4.2 Training and Network Approach . 55 4.2.1 Background and Notation . 55 4.2.2 SAC-X with Different State Spaces . 56 4.2.3 Policy Evaluation . 57 4.2.4 Policy Improvement . 58 4.2.5 Asymmetric Actor-Critic with Different State Spaces .
Recommended publications
  • Humanoid Teensize Open Platform Nimbro-OP
    Proceedings of 17th RoboCup International Symposium, Eindhoven, Netherlands, 2013 Humanoid TeenSize Open Platform NimbRo-OP Max Schwarz, Julio Pastrana, Philipp Allgeuer, Michael Schreiber, Sebastian Schueller, Marcell Missura and Sven Behnke Autonomous Intelligent Systems, Computer Science, Univ. of Bonn, Germany fschwarzm, schuell1, [email protected] fpastrana, allgeuer, schreiber, [email protected] http://ais.uni-bonn.de/nimbro/OP Abstract. In recent years, the introduction of affordable platforms in the KidSize class of the Humanoid League has had a positive impact on the performance of soccer robots. The lack of readily available larger robots, however, severely affects the number of participants in Teen- and AdultSize and consequently the progress of research that focuses on the challenges arising with robots of larger weight and size. This paper presents the first hardware release of a low cost Humanoid TeenSize open platform for research, the first software release, and the current state of ROS-based software development. The NimbRo-OP robot was designed to be easily manufactured, assembled, repaired, and modified. It is equipped with a wide-angle camera, ample computing power, and enough torque to enable full-body motions, such as dynamic bipedal locomotion, kicking, and getting up. 1 Introduction Low-cost and easy to maintain standardized hardware platforms, such as the DARwIn-OP [1], have had a positive impact on the performance of teams in the KidSize class of the RoboCup Humanoid League. They lower the barrier for new teams to enter the league and make maintaining a soccer team with the required number of players easier. Out-of-the-box capabilities like walking and kicking allow the research groups to focus on higher-level perceptual or behavioral skills, which increases the quality of the games and, hence, the attractiveness of RoboCup for visitors and media.
    [Show full text]
  • Malachy Eaton Evolutionary Humanoid Robotics Springerbriefs in Intelligent Systems
    SPRINGER BRIEFS IN INTELLIGENT SYSTEMS ARTIFICIAL INTELLIGENCE, MULTIAGENT SYSTEMS, AND COGNITIVE ROBOTICS Malachy Eaton Evolutionary Humanoid Robotics SpringerBriefs in Intelligent Systems Artificial Intelligence, Multiagent Systems, and Cognitive Robotics Series editors Gerhard Weiss, Maastricht, The Netherlands Karl Tuyls, Liverpool, UK More information about this series at http://www.springer.com/series/11845 Malachy Eaton Evolutionary Humanoid Robotics 123 Malachy Eaton Department of Computer Science and Information Systems University of Limerick Limerick Ireland ISSN 2196-548X ISSN 2196-5498 (electronic) SpringerBriefs in Intelligent Systems ISBN 978-3-662-44598-3 ISBN 978-3-662-44599-0 (eBook) DOI 10.1007/978-3-662-44599-0 Library of Congress Control Number: 2014959413 Springer Heidelberg New York Dordrecht London © The Author(s) 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
    [Show full text]
  • Omnidirectional Walking and Active Balance for Soccer Humanoid Robot
    Omnidirectional Walking and Active Balance for Soccer Humanoid Robot Nima Shafii1,2,3, Abbas Abdolmaleki1,2,4 , Rui Ferreira1, Nuno Lau1,4, Luis Paulo Reis2,5 1IEETA - Instituto de Engenharia Eletrónica e Telemática de Aveiro, Universidade de Aveiro 2LIACC - Laboratório de Inteligência Artificial e Ciência de Computadores, Universidade do Porto 3DEI/FEUP - Departamento de Engenharia Informática, Faculdade de Engenharia, Universidade do Porto 4DETI/UA - Departamento de Eletrónica, Telecomunicações e Informática, Universidade de Aveiro 5DSI/EEUM - Departamento de Sistemas de Informação, Escola de Engenharia da Universidade do Minho [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. Soccer Humanoid robots must be able to fulfill their tasks in a highly dynamic soccer field, which requires highly responsive and dynamic locomotion. It is very difficult to keep humanoids balance during walking. The position of the Zero Moment Point (ZMP) is widely used for dynamic stability measurement in biped locomotion. In this paper, we present an omnidirectional walk engine, which mainly consist of a Foot planner, a ZMP and Center of Mass (CoM) generator and an Active balance loop. The Foot planner, based on desire walk speed vector, generates future feet step positions that are then inputs to the ZMP generator. The cart-table model and preview controller are used to generate the CoM reference trajectory from the predefined ZMP trajectory. An active balance method is presented which keeps the robot's trunk upright when faced with environmental disturbances. We have tested the biped locomotion control approach on a simulated NAO robot.
    [Show full text]
  • Action Selection in Cooperative Robot Soccer Using Case-Based Reasoning
    Action Selection in Cooperative Robot Soccer using Case-Based Reasoning Tesi Doctoral Autora Raquel Ros Espinoza Director Ramon L´opez de M`antaras Tutor Josep Puyol Gruart Programa de Doctorat en Inform`atica Departament de Ci`encies de la Computaci´o Universitat Aut`onoma de Barcelona Realitzada a Institut d’Investigaci´oen Intel.lig`encia Artificial Consejo Superior de Investigaciones Cient´ıficas Bellaterra, Febrer del 2008. al Pepe y a la Mazo iv Acknowledgments Como se suele hacer en estos casos, tendr´ıa que comenzar agradeciendo a mi director de tesis, a Ramon. Y de hecho, lo har´e. Pero no solo por haberme guiado y dado consejo en mi investigaci´on durante estos ´ultimos cuatro a˜nos, sino tambi´en por consentirme y permitirme el cambio de tema tesis el d´ıa que se lo propuse, lo cual adem´as supon´ıa la adquisici´on de nuevos robots. Todo hay que decirlo, y para suerte m´ıa, a ´el tambi´en le hac´ıan gracia estos juguetes, y no fue dif´ıcil convencerlo de este cambio, que finalmente se presenta hoy en esta tesis. Junto a ´el, y aunque a efectos burocr´aticos no figura formalmente, Josep Llu´ıs ha sido mi segundo director. Sin duda alguna las discusiones y reuniones con ´el han sido de gran utilidad a la hora de desarrollar esta tesis, y es por ello que parte de este trabajo se lo debo a ´el. Continuar´eagradeciendo el apoyo de otro “jefe”, aunque un poco m´as lejano a la tesis, que me abri´olas puertas del instituto cuando yo empezaba a descubrir el ´area de la inteligencia artificial.
    [Show full text]
  • Download Complete Issue
    International Journal of Robotics and Automation (IJRA) Volume 1, Issue 2, 2010 Edited By Computer Science Journals www.cscjournals.org International Journal of Robotics and Automation (IJRA) Book: 2010 Volume 1, Issue 2 Publishing Date: 31-07-2010 Proceedings ISSN (Online): 2180-1312 This work is subjected to copyright. All rights are reserved whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illusions, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication of parts thereof is permitted only under the provision of the copyright law 1965, in its current version, and permission of use must always be obtained from CSC Publishers. Violations are liable to prosecution under the copyright law. IJE Journal is a part of CSC Publishers http://www.cscjournals.org ©IJE Journal Published in Malaysia Typesetting: Camera-ready by author, data conversation by CSC Publishing Services – CSC Journals, Malaysia CSC Publishers Editorial Preface Robots are becoming part of people's everyday social lives - and will increasingly become so. In future years, robots may become caretaking assistants for the elderly or academic tutors for our children, or medical assistants, day care assistants, or psychological counselors. Robots may become our co-workers in factories and offices, or maids in our homes. It is the second issue of volume first of International Journal of Robotics and Automation (IJRA). IJRA published six times in a year and it is being peer reviewed to very high International standards. IJRA looks to the different aspects like sensors in robot, control systems, manipulators, power supplies and software.
    [Show full text]