Deep Reinforcement Learning for Adaptive Human Robotic Collaboration
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN THE FIELD OF TECHNOLOGY ENGINEERING PHYSICS AND THE MAIN FIELD OF STUDY COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Deep Reinforcement Learning for Adaptive Human Robotic Collaboration JOHAN FREDIN HASLUM KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Deep Reinforcement Learning for Adaptive Human Robotic Collaboration JOHAN FREDIN HASLUM Master in Computer Science Date: April 11, 2019 Supervisor: Mårten Björkman Examiner: Olov Engvall School of Electrical Engineering and Computer Science iii Abstract Robots are expected to become an increasingly common part of most humans everyday lives. As the number of robots increase, so will also the number of human-robot interactions. For these interactions to be valuable and intuitive, new advanced robotic control policies will be necessary. Current policies often lack flexibility, rely heavily on human expertise and are often programmed for very specific use cases. A promising alternative is the use of Deep Reinforcement Learning, a family of algorithms that learn by trial and error. Following the recent success of Reinforcement Learning (RL) to areas previously considered too complex, RL has emerged as a possible method to learn Robotic Control Policies. This thesis explores the possibility of using Deep Re- inforcement Learning (DRL) as a method to learn Robotic Control Policies for Human Robotic Collaboration (HRC). Specifically, it will evaluate if DRL algorithms can be used to train a robot to collaboratively balance a ball with a human along a predetermined path on a table. To evaluate if it is possible several experiments are performed in a sim- ulator, where two robots jointly balance a ball, one emulating a human and one relying on the policy from the DRL algorithm. The experi- ments performed suggest that DRL can be used to enable HRC which perform equivalently or better than an emulated human performing the task alone. Further, the experiments indicate that less skilled hu- man collaborators performance can be improved by cooperating with a DRL trained robot. iv Sammanfattning Närvaron av robotar förväntas bli en allt vanligare del av de flesta människors vardagsliv. När antalet robotar ökar, så ökar även antalet människa-robot-interaktioner. För att dessa interaktioner ska vara an- vändbara och intuitiva, kommer nya avancerade robotkontrollstrate- gier att vara nödvändiga. Nuvarande strategier saknar ofta flexibilitet, är mycket beroende av mänsklig kunskap och är ofta programmerade för mycket specifika användningsfall. Ett lovande alternativ är användningen av Deep Reinforcement Le- arning, en familj av algoritmer som lär sig genom att testa sig fram, likt en människa. Efter den senaste tidens framgångar inom Reinfor- cement Learning (RL) vilket applicerats på områden som tidigare an- setts vara för komplexa har RL nu blivit ett möjlig alternativ till mer etablerade metoder för att lära sig kontrollstrategier för robotar. Denna uppsats undersöker möjligheten att använda Deep Reinforcement Le- arning (DRL) som metod för att lära sig sådana kontrollstrategier för människa-robot-samarbeten. Specifikt kommer den att utvärdera om DRL-algoritmer kan användas för att träna en robot och en människa att tillsammans balansera en boll längs en förutbestämd bana på ett bord. För att utvärdera om det är möjligt utförs flera experiment i en simu- lator, där två robotar gemensamt balanserar en boll, en simulerar en människa och den andra en robot som kontrolleras med hjälp av DRL- algoritmen. De utförda experimenten tyder på att DRL kan användas för att möjliggöra människa-robot-samarbeten som utförs lika bra el- ler bättre än en simulerad människa som utför uppgiften ensam. Vi- dare indikerar experimenten att prestationer med mindre kompetenta mänskliga deltagare kan förbättras genom att samarbeta med en DRL- algoritm-kontrollerad robot. Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Problem Specification . 2 1.2.1 Research Questions . 2 1.2.2 Scope & Delimitation . 3 1.3 Ethics and Societal Impact . 3 2 Background 5 2.1 Reinforcement Learning . 5 2.1.1 Natures Way of Learning . 5 2.1.2 Formulation . 6 2.1.3 Reinforcement Learning Tools and Ideas . 6 2.2 Deep Learning in Reinforcement Learning . 10 2.3 Deep Reinforcement Learning Algorithms . 11 2.3.1 Deep Q-Learning . 12 2.3.2 Deep Deterministic Policy Gradient . 12 2.3.3 A3C . 14 2.3.4 TRPO . 15 2.3.5 PPO . 16 3 Related Work 18 3.1 Reinforcement Learning for Robotic Control . 18 3.1.1 End-to-End Visuomotor Policies . 18 3.1.2 Training in Simulation . 19 3.1.3 Imitation Learning . 20 3.1.4 Auto encoder Learning . 21 3.1.5 Domain Randomization and Large Scale Data Col- lection . 22 v vi CONTENTS 3.1.6 Prioritized Experience Replay . 23 3.2 Human-Robot Collaboration . 24 4 Method 27 4.1 Problem . 28 4.2 Implementation . 29 4.2.1 Physics Engine . 29 4.2.2 Robot Environment . 29 4.2.3 Human Movement Simulation . 30 4.2.4 Reinforcement Learning Algorithm . 31 4.2.5 Observation Space . 32 4.2.6 Design choices . 33 4.3 Experimental Setup . 35 4.3.1 Common Details . 35 4.3.2 Collaborative Balancing with Varying Level of Skilled Human Partner . 36 4.3.3 Balancing with DRL Collaborator . 38 4.3.4 Balancing with More Information . 38 5 Experiments 39 5.1 Results . 39 5.1.1 Analysis . 39 5.1.2 Performance of Human Collaborator Acting Alone 41 5.1.3 Performance of Robot Collaborator Acting Alone 41 5.1.4 Performance of Human-Robot Collaborator . 46 5.1.5 Performance of Robot-Robot Collaborator . 56 5.1.6 Performance of Human-Robot Collaborator with more Information . 57 6 Discussion 60 6.1 Conclusions . 60 6.1.1 Research Questions . 61 6.1.2 Unanswered Questions . 62 6.1.3 Limitations and Improvements . 63 7 Conclusion 64 Bibliography 65 8 Appendix 70 Chapter 1 Introduction 1.1 Motivation As robots become increasingly common in society, the number of in- teractions between humans and robots will most likely increase signif- icantly. The interplay between these two will become a part of every- day life for a lot of people. In order for these interactions to become useful, they need to feel natural for the individual involved. This does not only require the robot to interact in a way that feels customary for humans in general, it also has to adapt on a person to person basis. Human collaboration involves complex organization and communi- cation that result in an outcome that is greater than the sum of the individual capabilities. This advanced interplay between two or more individuals, can often be done without much effort and in silence. For example, carrying a table together can easily be done, just relying on the haptic feedback felt in the collaborators hands. This ability to adapt in a way that feels ordinary to humans have not been transferred to robots. Successfully equipping robots with this capability, will likely be essential in the future. More specifically current research within the area of human-robot col- laboration is focused on industry applications. An example of this is the use of human-robot pair in car manufacturing. By combining the skillfulness of humans and their ability to learn quickly, with the cost efficiency and physical strength of robots, the efficiency is increased and the operational cost reduced [19]. 1 2 CHAPTER 1. INTRODUCTION The ability to learn how to best collaborate with humans is an active area of research, although several different approaches have been sug- gested, few have shown great promise [9]. Specific problems have been solved such as jointly lifting an object, however the proposed method rely heavily on human expert knowledge to implement a func- tioning control system. This is not only true for collaboration tasks, but rather all robotic control problems. Since human expertise is costly and can be a scarce resource, the possibility of teaching robots how to inter- act with it’s surroundings using laymen or no human intervention at all would enable cheaper and more accessible robotic control systems. The possibility of teaching robots how to behave through other means than human crafted control policies is one that is researched exten- sively. One promising such field is Deep Reinforcement Learning (DRL). DRL is a set of self learning algorithms that relies on interaction and examples, thus requiring no expert knowledge. A lot of work is cur- rently focused on the applicability of DRL to robotic control and the vision of many researchers is for robots to learn in a similar way as humans. Namely by learning to recognize visual and other sensory inputs and learn how to map these inputs to appropriate actions. The application of DRL to robotic control has shown promise, although it is still in early stages of development. The goal of this thesis is to evaluate the applicability of DRL to human-robot collaborations. Fur- ther it also evaluates the importance of different sensory modalities on the performance of the algorithm. 1.2 Problem Specification 1.2.1 Research Questions The questions that this thesis attempts to explore and answer are the following: What is a suitable Deep Reinforcement Learning framework for learning adap- tive human-robot interactions, such that robots can learn to collaborate with humans, in what the human perceive as a natural way? CHAPTER 1. INTRODUCTION 3 What impact does the available sensor modalities have on the performance off such a framework? 1.2.2 Scope & Delimitation The scope of this thesis is to evaluate the applicability of DRL algo- rithms on human robot collaborative problems. This is evaluated us- ing a toy problem, which represents the challenges involved with Hu- man Robotic Interactions. This toy problem is an adaptation of previous experiments used in other research projects involving human-robot and human-human col- laboration.