74. Learning from Humans Fro
Total Page:16
File Type:pdf, Size:1020Kb
1995 Multimedia Contents Learning74. Learning from Humans fro Aude G. Billard, Sylvain Calinon, Rüdiger Dillmann 74.1 Learning of Robots .............................. 1995 This chapter surveys the main approaches devel- 74.1.1 Principle................................... 1996 oped to date to endow robots with the ability 74.1.2 Brief History.............................. 1996 to learn from human guidance. The field is best 74.2 Key Issues When Learning known as robot programming by demonstration, from Human Demonstrations ............... 1998 robot learning from/by demonstration, appren- 74.2.1 When and Whom to Imitate ....... 1998 ticeship learning and imitation learning. We start 74.2.2 How to Imitate and How to Solve with a brief historical overview of the field. We the Correspondence Problem...... 1999 then summarize the various approaches taken 74.3 Interfaces for Demonstration................ 2000 to solve four main questions: when, what, who 74.4 Algorithms to Learn from Humans ........ 2002 and when to imitate. We emphasize the im- 74.4.1 Learning Individual Motions....... 2002 portance of choosing well the interface and the 74.4.2 Learning Compound Actions ....... 2003 channels used to convey the demonstrations, 74.4.3 Incremental Teaching Methods ... 2004 with an eye on interfaces providing force control 74.4.4 Combining Learning and force feedback. We then review algorith- from Humans with Other mic approaches to model skills individually and Learning Techniques .................. 2005 as a compound and algorithms that combine 74.4.5 Learning from Humans, a Form learning from human guidance with reinforce- of Human–Robot Interaction...... 2006 ment learning. We close with a look on the use 74.5 Conclusions and Open Issues of language to guide teaching and a list of open in Robot LfD ........................................ 2008 issues. Video-References......................................... 2009 References................................................... 2009 74.1 Learning of Robots Robot learning from humans relates to situations in Learning for robotics. We also exclude works where which the robot learns from interacting with a human. the robot learns implicitly from being in presence of This must be contrasted to the vast body of work on a human, while the human is not actively coaching the robot learning where the robot learns on its own,thatis, robot, as these works are covered in the companion through trial and error and without external guidance. chapter on Social Robotics. We hence focus our survey In this chapter, we cover works that combine reinforce- to all works where the human is actively teaching the Part G | 74.1 ment learning (RL) with techniques that use human robot, by providing demonstrations of how to perform guidance, e.g., to bootstrap the search in RL.However, the task. we exclude from this survey all works that use purely Various terminologies have been used to refer to this reinforcement learning, even though one could argue body of work. These include programming by demon- that providing a reward is one form of human guid- stration (PbD), learning from human demonstration ance. We consider that providing a reward function is (LfD), imitation learning,andapprenticeship learning. akin to providing an objective function and hence re- All of these refer to a general paradigm for enabling fer the reader to the companion chapter on Machine robots to autonomously perform new tasks from ob- 1996 Part G Robots and Humans serving and learning, therefore, from the observation of view of the main approaches to solving LfD and con- humans performing these tasks. clude with an outlook on open issues. 74.1.1 Principle 74.1.2 Brief History Rather than requiring users to analytically decompose Robot learning from demonstration started in the 1980s. and manually program a desired behavior, work in LfD- Then, and still to a large extent now, robots had to be PbD takes the view that an appropriate robot controller explicitly and tediously hand programmed for each task can be derived from observations of a human’s own per- they had to perform. PbD sought to minimize, or even formance thereof. The aim is for robot capabilities to be eliminate, this difficult step. more easily extended and adapted to novel situations, The rationale for moving from purely prepro- even by users without programming ability: grammed robots to very flexible user-based interfaces for training the robot to perform a task is threefold. First The main principle of robot learning from demon- and foremost, PbD is a powerful mechanism for reduc- stration is that end-users can teach robots new tasks ing the complexity of search spaces for learning. When without programming. observing either good or bad examples, one can reduce the search for a possible solution, by either starting the Consider a household robot capable of performing search from the observed good solution (local optima), manipulation tasks. One task that an end-user may de- or conversely, by eliminating from the search space sire the robot to perform is to prepare a meal, such what is known as a bad solution. Imitation learning is, as preparing an orange juice for breakfast (Fig. 74.1 thus, a powerful tool for enhancing and accelerating and VIDEO 29 ). Doing so may involve multiple sub- learning in both animals and artifacts. tasks, such as juicing the orange, throwing the rest of the orange in the trash, and pouring the liquid into a cup. a) Further, every time this meal is prepared, the robot will need to adapt its motion to the fact that the location and type object (cup, juicer) may change. In a traditional programming scenario, a human pro- grammer would have to code a robot controller that is capable of responding to any situation the robot may face. The overall task may need to be broken down into Using invariants tens or hundreds of smaller steps, and each one of these in relative positions steps should be tested for robustness prior to the robot leaving the factory. If and when failures occur in the field, highly-skilled technicians would need to be dis- b) patched to update the system for the new circumstances. Instead, LfD allows the end-user to program the robot simply by showing it how to perform the task – no cod- ing is required. Then, when failures occur, the end-user only needs to provide more demonstrations, rather than calling for professional help. LfD hence seeks to endow robots with the ability to learn what it means to per- form a task by generalizing from several observations (Fig. 74.1 and VIDEO 29 ). LfD is not a record and play technique. LfD implies Part G | 74.1 learning, henceforth, generalization. Fig. 74.1 (a) The teacher does several demonstrations of Next, we give a brief historical overview of the way the task of juicing an orange, by changing the location of the field evolved over the years. This is followed, in each item to allow the robot to generalize correctly. That is, Sect. 74.2, by an introduction to the issues at the core the robot should be able to infer, by comparing the demon- of LfD. In Sect. 74.3, we discuss the crucial role that strations, that only the relative locations matter, as opposed the interface used for LfD plays in the success of the to the exact locations as recorded from a global coordinate teaching, emphasizing how the choice of interface de- system. (b) The robot can then reproduce the task even termines the type of information that can be conveyed when the objects are located in positions not seen in the to the robot. Finally, in Sect. 74.4, we give a generic demonstrations VIDEO 29 Learning from Humans 74.1 Learning of Robots 1997 Second, imitation learning offers an implicit means states and the actions according to symbolic relation- of training a machine, such that explicit and tedious pro- ships, such as in contact, close-to, move-to, grasp- gramming of a task by a human user can be minimized object, move-above, etc. Appropriate numerical defi- or eliminated. Imitation learning is thus a natural means nitions of these symbols (i. e., when would an object of interacting with a machine that would be accessible be considered as close-to or far-from) were given as to lay people. prior knowledge to the system. A complete demonstra- Third, studying and modeling the coupling of per- tion was thus encoded in a graph-based representation, ception and action, which is at the core of imitation where each state constituted a graph node and each learning, helps us to understand the mechanisms by action a directed link between two nodes. Symbolic which the self-organization of perception and action reasoning could then unify different graphical repre- could arise during development. The reciprocal inter- sentations for the same task by merging and deleting action of perception and action could explain how nodes [74.2]. competence in motor control can be grounded in the Munch et al. [74.6] suggested the use of machine rich structure of perceptual variables, and vice versa, learning (ML) techniques to recognize elementary op- how the processes of perception can develop as means erators (EOs), thus defining a discrete set of basic motor to create successful actions. skills, with industrial robotics applications in mind. In PbD promises were thus multiple. On the one hand, this early work, the authors already established several one hoped that it would make the learning faster, in con- key issues of PbD in robotics. These include questions trast to trial-and-error methods trying to learn the skill such as how to generalize a task, how to reproduce tabula rasa. On the other hand, one expected that being a skill in a completely novel situation, how to evaluate user-friendly, the methods would enhance the applica- a reproduction attempt, and how to better define the role tion of robots in human daily environments.