A Survey of Behavior Learning Applications in Robotics

Journal Title XX(X):1–38 A Survey of Behavior Learning c The Author(s) 2019 Reprints and permission: Applications in Robotics - State of the sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/ToBeAssigned Art and Perspectives www.sagepub.com/ Alexander Fabisch1, Christoph Petzoldt2, Marc Otto2, Frank Kirchner1,2 Abstract Recent success of machine learning in many domains has been overwhelming, which often leads to false expectations regarding the capabilities of behavior learning in robotics. In this survey, we analyze the current state of machine learning for robotic behaviors. We will give a broad overview of behaviors that have been learned and used on real robots. Our focus is on kinematically or sensorially complex robots. That includes humanoid robots or parts of humanoid robots, for example, legged robots or robotic arms. We will classify presented behaviors according to various categories and we will draw conclusions about what can be learned and what should be learned. Furthermore, we will give an outlook on problems that are challenging today but might be solved by machine learning in the future and argue that classical robotics and other approaches from artificial intelligence should be integrated more with machine learning to form complete, autonomous systems. Keywords machine learning, reinforcement learning, imitation learning, behavior learning, robotic applications Introduction 2009; Kober et al. 2013; Kormushev et al. 2013; Tai and Liu 2016; Arulkumaran et al. 2017; Osa et al. 2018). In this Machine learning and particularly deep learning (LeCun survey, we take a broader perspective to analyze the state of et al. 2015) made groundbreaking success possible in the art in learning robotic behavior and do explicitely not many domains, such as computer vision (Krizhevsky et al. focus on algorithms but on (mostly) real world applications. 2012), speech recognition (Hinton et al. 2012), playing We explicitely focus on applications with real robots, video games (Mnih et al. 2015), and playing Go (Silver because it is much more demanding to integrate and learn et al. 2016). It is unquestionable that learning from data, behaviors in a complex robotic system operating in the real learning from experience and observations are keys to world. We give a very broad overview of considered behavior really adaptive and intelligent agents – virtual or physical. learning problems on real robotic systems. We categorize However, people are often susceptible to the fallacy that problems and solutions, analyze problem characteristics, and the state of the art in robotic control today heavily relies point out where and why machine learning is useful. on machine learning. This is often not the case. An example for this is given by Irpan (2018): at the time of This article is structured as follows. We first present a writing this paper, the humanoid robot Atlas from Boston detailed summary of selected highlights that advanced the Dynamics is one of the most impressive works in robot state of the art in robotic behavior learning. We proceed control. It is able to walk and run on irregular terrain, with definitions of behavior and related terms. We present jump precisely with one or two legs, and even do a back categories to distinguish and classify behaviors before we flip (Boston Dynamics 2018). Irpan (2018) reports that present a broad overview of the state of the art in robotic arXiv:1906.01868v1 [cs.RO] 5 Jun 2019 people often assume that Atlas uses reinforcement learning. behavior learning problems. We conclude with a discussion Publications from Bost Dynamics are sparse, but they do of our findings and an outlook. not include explanations of machine learning algorithms for control (Raibert et al. 2008; Nelson et al. 2012). Kuindersma et al. (2016) present their work with the robot Atlas, which includes state estimation and optimization methods for locomotion behavior. Robotic applications have demanding requirements on processing power, real-time 1 computation, sample-efficiency, and safety, which often German Research Center for Artificial Intelligence (DFKI GmbH), Robotics Innovation Center, Germany makes the application of state-of-the-art machine learning 2University of Bremen, Robotics Research Group, Germany for robot behavior learning difficult. Results in the area of machine learning are impressive but they can lead to false Corresponding author: expectations. This led us to the questions: what can and what Alexander Fabisch, German Research Center for Artificial Intelligence (DFKI GmbH), should be learned? DFKI Bremen, Robotics Innovation Center, Recent surveys of the field mostly focus on algorithmic Robert-Hooke-Straße 1, D-28359 Bremen, Germany aspects of machine learning (Billard et al. 2008; Argall et al. Email: [email protected] Prepared using sagej.cls [Version: 2016/06/24 v1.10] 2 Journal Title XX(X) Selected Highlights coupling is learned to mitigate the influence of minor perturbations of the end-effector that can have significant Among all the publications that we discuss here, we selected influence on the ball trajectory. A successful behavior is some highlights that we found to be relevant extensions of learned after 75 episodes. the repertoire of robotic behavior learning problems that can The problem of flipping a pancake with a pan has been be solved. We briefly summarize these behavior learning solved by Kormushev et al. (2010b) with the same methods: problems and their solutions individually before we enter a controller that is very similar to a DMP is initialized from the discussion of the whole field from a broader perspective. kinesthetic teaching and refined with PoWER. The behavior We find it crucial to understand the algorithmic development has been learned with a torque-controlled Barrett WAM arm and technical challenges in the field. It also gives a good with 7 DOF. The artificial pancake has a weight of 26 grams impression of the current state of the art. Later in this article, only, which makes its motion less predictable because it is we make a distinction whether the perception or the action susceptible to the influence of air flow. For refinement, a part of these behaviors have been learned (see Figure 1). complex reward function has been designed that takes into An early work that combines behavior learning and account the trajectory of the pancake (flipping and catching), robotics has been published by Kirchner (1997). A goal- which is measured with a marker-based motion capture directed walking behavior for the six-legged walking system. After 50 episodes, the first successful catch was machine SIR ARTHUR with 16 degrees of freedom (DOF) recorded. A remarkable finding is that the learned behavior and four light sensors has been learned. The behavior has includes a useful aspect that has not directly been encoded in been learned on three levels – (i) bottom: elementary swing the reward function: it made a compliant vertical movement and stance movements of individual legs are learned first, (ii) for catching the pancake which decreases the chance of the middle: these elementary actions are then used and activated pancake bouncing off from the surface of the pan. in a temporal sequence to perform more complex behaviors Table tennis with a Barrett WAM arm has been learned like a forward movement of the whole robot, and (iii) top: a by Mulling¨ et al. (2011, 2013). Particularly challenging is goal-achieving behavior in a given environment with external the advanced perception and state estimation problem. In stimuli. The top-level behavior was able to make use of the comparison to previous work, behaviors have to take an light sensors to find a source of maximum light intensity. estimate of the future ball trajectory into account when Reinforcement learning, a hierarchical version of Q-learning generating movements that determine where, when, and (Watkins 1989), has been used to learn the behavior. On the how the robot hits the ball. A vision system has been lowest level, individual reward functions for lifting up the used to track the ball with 60 Hz. The ball position is leg, moving the leg to the ground, stance the leg backward, tracked with an extended Kalman filter and ball trajectories and swinging the leg forward have been defined. are predicted with a simplified model that neglected the Peters et al. (2005) presented an algorithmic milestone in spin of the ball. 25 striking movements have been learned reinforcement learning for robotic systems. They specifically from kinesthetic teaching to form a library of movement used a robot arm with seven degrees of freedom (DOF) primitives. A modified DMP version that allows to set a final to play tee-ball, a simplified version of baseball, where velocity as a meta-parameter has been used to represent the the ball is placed on a flexible shaft. Their solution demonstrations. Desired position, velocity and orientation combines imitation learning through kinesthetic teaching of the racket are computed analytically for an estimated with dynamical movement primitives (DMPs) and policy ball trajectory and a given target on the opponent’s court search, which is an approach that has been used in many and are given as meta-parameters to the modified DMP. following works. In their work, Peters et al. (2005) used In addition, based on these task parameters, a weighted natural actor-critic (NAC) for policy search. The goal was average of known striking movements is computed by a to hit the ball so that it flies as far as possible. The gating network. This method is called mixture of movement reward for policy search included a term that penalizes primitives. The reward function encourages minimization of squared accelerations and rewards the distance. The distance the distance between the desired goal on the opponent’s court is obtained from an estimated trajectory computed with and the actual point where the ball hits the table. In the final trajectory samples that are measured with a vision system.

Load more