<<

1995 Multimedia Contents Learning74. Learning from Humans fro

Aude G. Billard, Sylvain Calinon, Rüdiger Dillmann

74.1 Learning of ...... 1995 This chapter surveys the main approaches devel- 74.1.1 Principle...... 1996 oped to date to endow robots with the ability 74.1.2 Brief History...... 1996 to learn from human guidance. The field is best 74.2 Key Issues When Learning known as programming by demonstration, from Human Demonstrations ...... 1998 robot learning from/by demonstration, appren- 74.2.1 When and Whom to Imitate ...... 1998 ticeship learning and imitation learning. We start 74.2.2 How to Imitate and How to Solve with a brief historical overview of the field. We the Correspondence Problem...... 1999 then summarize the various approaches taken 74.3 Interfaces for Demonstration...... 2000 to solve four main questions: when, what, who 74.4 Algorithms to Learn from Humans ...... 2002 and when to imitate. We emphasize the im- 74.4.1 Learning Individual Motions...... 2002 portance of choosing well the interface and the 74.4.2 Learning Compound Actions ...... 2003 channels used to convey the demonstrations, 74.4.3 Incremental Teaching Methods ... 2004 with an eye on interfaces providing force control 74.4.4 Combining Learning and force feedback. We then review algorith- from Humans with Other mic approaches to model skills individually and Learning Techniques ...... 2005 as a compound and algorithms that combine 74.4.5 Learning from Humans, a Form learning from human guidance with reinforce- of Human–Robot Interaction...... 2006 ment learning. We close with a look on the use 74.5 Conclusions and Open Issues of language to guide teaching and a list of open in Robot LfD ...... 2008 issues. Video-References...... 2009 References...... 2009

74.1 Learning of Robots

Robot learning from humans relates to situations in Learning for . We also exclude works where which the robot learns from interacting with a human. the robot learns implicitly from being in presence of This must be contrasted to the vast body of work on a human, while the human is not actively coaching the robot learning where the robot learns on its own,thatis, robot, as these works are covered in the companion through trial and error and without external guidance. chapter on Social Robotics. We hence focus our survey In this chapter, we cover works that combine reinforce- to all works where the human is actively teaching the 74.1 | G Part ment learning (RL) with techniques that use human robot, by providing demonstrations of how to perform guidance, e.g., to bootstrap the search in RL.However, the task. we exclude from this survey all works that use purely Various terminologies have been used to refer to this , even though one could argue body of work. These include programming by demon- that providing a reward is one form of human guid- stration (PbD), learning from human demonstration ance. We consider that providing a reward function is (LfD), imitation learning,andapprenticeship learning. akin to providing an objective function and hence re- All of these refer to a general paradigm for enabling fer the reader to the companion chapter on Machine robots to autonomously perform new tasks from ob- and con- LfD positions in relative Using invariants sought to minimize, or even PbD VIDEO 29 is a powerful mechanism for reduc- PbD The teacher does several demonstrations of The robot can then reproduce the task even ) ) a ( b ( The rationale for moving from purely prepro- a) b) clude with an outlook on open issues. view of the main approaches to solving Fig. 74.1 74.1.2 Brief History Robot learning from demonstration started in theThen, 1980s. and still toexplicitly a and tediously large hand extent programmed now, forthey robots each had task had to to perform. be the task of juicingeach an item orange, to allow by the changing robotthe to the robot generalize should location correctly. be That of able is, strations, to that infer, only by the comparing relative theto locations demon- the matter, as exact opposed locations assystem. recorded from a global coordinate when the objects aredemonstrations located in positions not seen in the eliminate, this difficult step. grammed robots to veryfor training flexible the user-based robot to interfaces perform aand task foremost, is threefold. First ing the complexity of searchobserving spaces either for good learning. or When badthe examples, one search can for reduce a possiblesearch solution, from by the either observed starting good the or solution (local conversely, optima), bywhat eliminating is known from as thethus, a bad search a solution. powerful space Imitationlearning learning tool in is, both for animals and enhancing artifacts. and accelerating - 74.1 LfD implies the robot LfD program . hence seeks to endow 74.4, we give a generic LfD ). plays in the success of the is for robot capabilities to be LfD VIDEO 29 ). Doing so may involve multiple sub- 74.3, we discuss the crucial role that allows the end-user to and VIDEO 29 LfD is not a record and play technique. . In Sect. 74.2, by an introduction to the issues at the core 74.1 takes the view that an appropriate robot controller Next, we give a brief historical overview of the way LfD The main principle ofstration robot is learning that end-users from can teach demon- robotswithout new programming. tasks Consider a household robot capable of performing In a traditional programming scenario, a human pro- LfD Robots and Humans the field evolved overSect. the years.of This is followed,the in interface used for teaching, emphasizing how thetermines choice the of type interface of de- to information the that robot. can Finally, be in conveyed Sect. learning, henceforth, generalization PbD can be derived from observations of aformance human’s own thereof. per- The aim and more easily extended andeven by adapted users to without programming novel ability: situations, manipulation tasks. One task thatsire an the end-user may robot de- as to preparing perform an is orange to juice prepare for a breakfast meal, (Fig. such 74.1.1 Principle Rather than requiring usersand to manually program analytically a decompose desired behavior, work in serving and learning, therefore, from thehumans observation performing of these tasks. tasks, such as juicingthe the orange in orange, the trash, throwing and pouring theFurther, the every liquid rest time into a of this cup. meal isneed prepared, to the adapt robot its will motiontype to object the (cup, fact juicer) that may the change. location and grammer would have to codecapable a of robot responding controller to thatface. The any is overall situation task the maytens need robot or to hundreds may be of broken smaller down steps,steps into and should each be one tested of these forleaving robustness the prior to factory. the Iffield, robot and highly-skilled technicians when would failurespatched to need occur update the to in system for be the the Instead, new dis- circumstances. simply by showing it how toing perform is the required. task Then, – when no failuresonly cod- occur, needs the to end-user provide more demonstrations,calling rather for than professional help. robots with the abilityform to a task learn by what(Fig. generalizing it from means several to observations per-

Part G

Part G | 74.1 1996 Learning from Humans 74.1 Learning of Robots 1997

Second, imitation learning offers an implicit means states and the actions according to symbolic relation- of training a machine, such that explicit and tedious pro- ships, such as in contact, close-to, move-to, grasp- gramming of a task by a human user can be minimized object, move-above, etc. Appropriate numerical defi- or eliminated. Imitation learning is thus a natural means nitions of these symbols (i. e., when would an object of interacting with a machine that would be accessible be considered as close-to or far-from) were given as to lay people. prior knowledge to the system. A complete demonstra- Third, studying and modeling the coupling of per- tion was thus encoded in a graph-based representation, ception and action, which is at the core of imitation where each state constituted a graph node and each learning, helps us to understand the mechanisms by action a directed link between two nodes. Symbolic which the self-organization of perception and action reasoning could then unify different graphical repre- could arise during development. The reciprocal inter- sentations for the same task by merging and deleting action of perception and action could explain how nodes [74.2]. competence in motor control can be grounded in the Munch et al. [74.6] suggested the use of machine rich structure of perceptual variables, and vice versa, learning (ML) techniques to recognize elementary op- how the processes of perception can develop as means erators (EOs), thus defining a discrete set of basic motor to create successful actions. skills, with industrial robotics applications in mind. In PbD promises were thus multiple. On the one hand, this early work, the authors already established several one hoped that it would make the learning faster, in con- key issues of PbD in robotics. These include questions trast to trial-and-error methods trying to learn the skill such as how to generalize a task, how to reproduce tabula rasa. On the other hand, one expected that being a skill in a completely novel situation, how to evaluate user-friendly, the methods would enhance the applica- a reproduction attempt, and how to better define the role tion of robots in human daily environments. of the user during learning. Munch et al. [74.6] admit- At the beginning of the 1980s, LfD, known then as ted that generalizing over a sequence of discrete actions programming by demonstration (PbD), started attract- was only one part of the problem since the controller ing attention in manufacturing robotics. PbD appeared of the robot also required the learning of continuous as a promising route to automate the tedious manual trajectories to control the actuators. They proposed to programming of robots, reducing the costs involved in overcome the missing parts of the learning process by the development and maintenance of robots in the fac- leveraging them to the user, who took an active role in tory. the teaching process. As a first approach in PbD, symbolic reasoning These early works highlighted the importance of was commonly adopted in robotics [74.1–5], with providing a set of examples that are usable by the robot: processes referred to as teach-in, guiding,orplay- (1) by constraining the demonstrations to modalities back methods. In these works, PbD was performed that the robot can understand; and (2) by providing through manual (teleoperated) control. The position of a sufficient number of examples to achieve a desired the end-effector and the forces applied on the object generality. They noted the importance of providing an manipulated were stored throughout the demonstra- adaptive controller to reproduce the task in new sit- tions together with the positions and orientations of uations, that is, how to adjust an already acquired the obstacles and of the target. This sensorimotor in- program. The evaluation of a reproduction attempt was formation was then segmented into discrete subgoals also leveraged to the user by letting him/her provide (key points along the trajectory) and into appropri- additional examples of the skill in the regions of the ate pre-defined actions to attain these subgoals. Ac- learning space that had not been covered yet. In this tions were commonly chosen to be simple point-to- way, the teacher/expert could control the generalization point movements that industrial robots employed at capabilities of the robot. this time. Examples of subgoals would be, e.g., the With the increasing development of mobile and robot’s gripper orientation and position in relation to the humanoid robots, the field went on adopting an inter- atG|74.1 | G Part goal [74.3]. Consequently, the demonstrated task was disciplinary approach, taking into account evidence of segmented into a sequence of state-action-state transi- specific neural mechanisms for visuomotor imitation tions. in primates [74.7–9] and of developmental stages of To take into account the variability of human mo- imitation capacities in children [74.10, 11]. The latter tion and the noise inherent to the sensors capturing the promotes the introduction of socially driven behavior movements, it appeared necessary to develop a method in the robot to sustain interaction and improve teach- that would consolidate all demonstrated movements. ing [74.12, 13] and of an interactive teaching process, For this purpose, the state-action-state sequence was in which the robot takes a more active role and may converted into symbolic if-then rules, describing the ask the user for additional sources of information, when , ], 26 ), i. e., s) [74. ANNs)24 [74. ]) what-to-imitate 21 b) motion in a different situation motion in a different to enhance the interaction Reproduction ofReproduction a generalized ], as we will discuss in more ] were quite popular. These 35 27 , ] and various non-linear regression 34 , [74. 33 human-like – 21 problem) (after [74. 74.4. 28 A robot learns how to make a chess move ) a ( demonstrations s) [74. fuzzy logic New learning challenges were, thus, set forth. The robot reproduces the skill in a new context (for Observation ofObservation multiple ) a) HMM b a different initialing position an of appropriate theconstraints controller chess and that constraints piece) satisfies(how-to-imitate relative by both to find- the its body task limitation (namely moving the queen forward) bydifferent generalizing demonstrations across of the taskdifferent performed in situations slightly hand). (different The starting robot recordslearns positions the to trajectories extract of of invariantthat its features the the task joints ( constraints and aretion reduced to located a in subpart of a( the plane mo- defined by the three chess pieces. Fig. 74.2 tated. For a given task,properties certain may observable or be affectable instance, irrelevant if and the safely demonstratorcation ignored. from always For the approaches north, ado is the lo- it same? The necessary answer for to the this robot question to strongly in- demonstrations, and the productiongeneralize issue, i. the e., how movement to tools to such as new artificial neural situations. networks25], ( Initially, radial-basis function networks (RBF and have lately been( replaced by hidden Markov models techniques [74. detail in Sect. Robots were expected toibility show and a versatility both highin in degree their their control of learning system flex- in system orderurally and to with be able human to users interact and nat-(e.g., demonstrate by similar moving skills inthe the same tools same as rooms humans). Robots andexpected were more manipulating to and more act and so that theirand, hence, behavior more would acceptable. be more predictable and how ]that ], data .These 41 74.2 – 18 robot pro- ]orkines- 38 , 20 . In essence, 26 , 18 ]. follow a conceptual what to imitate, 23 who to imitate [74. – PbD 21 was replaced by the more LfD ,and imitation learning ], speech command [74. 37]. These have been formulated as 17 , , 15]. Eventually, the notion of 36 16 , illustrate how these two problems can be ], the laser range finder [74. ) has at core to develop algorithms that are 14 19 when to imitate , LfD started incorporating more of those tools to tackle VIDEO 97 How to Determine the Evaluation Metric The field progressively moved from simply copying Recent progress affected mostly the interfaces The field has identified a number of key problems Robots and Humans could not easily be unified underherent a operating small number principles. of The co- and above their four solutions aim questions atmaking being no generic in assumptions the onbe sense the transmitted. of type of skills that may solved in a principled mannervation through of statistical the obser- demonstrations. What to imitate relateswhich to aspects the problem of of the determining demonstration should be imi- Whom and whenplored to so imitate far,questions has and have been really hence been largely to addressed. unex- Figure date, only the first two 74.2.1 When and Whom to Imitate the demonstrated movements to generalizing acrossof sets demonstrations. AsPbD machine learningboth progressed, the perception issue, i. e., how to generalize across approach very similar toworks. that followed by these prior at theguiding/teleoperating basis the of robotsively the replaced have by teaching. more beenas user-friendly Traditional vision progres- interfaces, [74. ways such gloves [74. of thetic teaching (i. e.,arms by through the manually motion) [74. guiding the robot’s 74.2 Key Issues When LearningAs from mentioned Human in Demonstrations thestration beginning, ( learning from demon- generic in their representationway they of generate the the skills. skills and in the that need to beproach solved to for ensuring transferring such skillssituations a across [74. generic various ap- agents and gramming by demonstration biological labeling of a large part of current works in questions were formulated in response to theof large diverse body work in robotics needed [74. a set of generic questions, namely to imitate

Part G

Part G | 74.2 1998 Learning from Humans 74.2 Key Issues When Learning from Human Demonstrations 1999

fluences whether or not a derived robot controller is and imitator can be said to correspond, and give brief a successful imitation – a robot that approaches from examples: the south is appropriately trained if direction is not important, but needs further education if it is. This is- Perceptual equivalence: Due to differences between sue is related to questions of signal versus noise and human and robot sensory capabilities, the same is answered by determining the metric by which the scene may appear to be very different. For in- resulting behavior is evaluated. Different ways can be stance, while a human may identify humans and taken to address this issue. The simplest approach is to take a statistical perspective and deem as relevant the a) b) parts (dimension, region of input space) of the data that are consistently measured across all demonstration in- stances [74.21]. If the dimension of the data is too high, such an approach may require too many demonstrations to gather enough statistics. An alternative is then to have the teacher help the robot determine what is relevant by pointing out the parts of the task that are most impor- tant. c) In summary, what to imitate removes consideration of details that, while perceptible/performable, do not matter for the task. It participates in determining the metric by which the reproduction of the robot can be measured. In continuous control tasks, what to imitate relates to the problem of defining automatically the fea- ture space for learning, as well the constraints and the cost function. In discrete control tasks, such as those treated by reinforcement learning and symbolic reason- ing, what to imitate relates to the problem of how to define the state and action space and of how to automat- ically learn the pre/post conditions in an autonomous decision system. d) e)

74.2.2 How to Imitate and How to Solve the Correspondence Problem

How to imitate consists in determining how the robot will actually perform the learned behaviors to maximize the metric found when solving the what to imitate prob- lem. Often, a robot cannot act exactly the same way as a human does, due to differences in physical embodi- ment. For example, if the demonstrator uses a foot to move an object, is it acceptable for a wheeled robot to bump it, or should it use a gripper instead? If the met- Fig. 74.3 (a,b) Perceptual equivalence (adapted ric does not have appendage-specific terms, it may not from [74.42]). (c) Physical equivalence. The humanoid matter. robot has the same arrangement of principal articulations This issue is closely related to that of the corre- as the human demonstrator, but different limb lengths atG|74.2 | G Part spondence problem [74.36]. Robots and humans, while and joint angle limits. The has a different inhabiting the same space and interacting with the same number and arrangement of articulations, which makes the objects, and perhaps even superficially similar, still mapping problem more challenging (illustration created perceive and interact with the world in fundamentally with the V-REP simulator [74.43]). (d,e) Offline full-body different ways. To evaluate the similarity between hu- motion transfer by taking into account the kinematic man behavior and that of robots, we must first deal with and dynamic disparity between the human and the hu- the fact that humans and robots may occupy different manoid [74.44]. See also VIDEO 98 and VIDEO 99 state spaces, of perhaps different dimensions. We iden- for example of mapping of full body motion from human tify two different ways in which states of demonstrator to humanoids 74.4 develops ]). 50 LfD how to imitate ]. 46 , 45 ). By exploiting the com- VIDEO 104 roll and bump.capabilities Solving this is discrepancy akinproblem in to to motor solving achieve the theway to same solve effect. this problem. Typically,compute the robot a may patheffector (in that Cartesian is space)human close hand, to for while the its relyingto path end- on find followed inverse the by kinematics appropriatefootball the example above, joint this would displacements. require the robot Into the determine acorresponds path to for the its pathright center foot followed of by when mass projected thethis equivalence on which human’s is the very ground. tasklutions dependent. Clearly, to Recent this so- problemmotion for can be hand found motion in [74. and body We can think of perceptual equivalence as dealing dynamics of their body or, in other words,figuration if space the con- is ofThis different is dimension typically and done when size. the mapping joints the that motion are of tracked visuallyhuman to body a that model matches of closelySuch the that mapping would of be the particularly difficult robot. toform per- when the walkingdiffers machine importantly from (e.g., the human a body. The ) lem prob- of mapping actions across two dissimilarwas bodies already evoked earlier onrespondence and problem. refers to the cor- teaching, wherethrough the the task robot by theplifies is human. This the approach physically correspondence sim- user problem guided demonstrate by the lettingment skill the with in the the robot’s ownvides robot’s capabilities. a environ- It natural also teaching pro- reproduced interface by to the correct robot.technology a offer Recent skill the advances possibility to in teachto skin robots exploit how tactilemiddle contact and on an object (Fig. pliance of the iCubteach robot’s fingers, the the robot how teacher togers can adapt in the posture response of tomeasured the at a fin- the robot’s change finger tips in [74. tactile sensing as with the manner in which thePerceptual agents perceive equivalence the world. requires toinformation make necessary sure to that perform the to the both humans task and is robots.with available Physical the manner equivalence in deals which agentsthe affect world, and so interact that the with task is performable by both agents. 2. Second, there are techniques such as kinesthetic , ex- 33 74.4b LfD of full body motion [74. : Due to differences between show an example of full body LfD VIDEO 98 ]. These methods are advantageous in that 74.3b). Humans run and kick, while robots 74.3a). Another point of comparison is tactile 49 – human and robot embodiments, humansmay and robots perform differentsame actions physical effect. to For instance,forming accomplish even the when same the task per- (soccer),may humans interact and with robots the environment in(Fig. different ways Physical equivalence plores the limits ofby building these interfaces perceptual that either equivalences, rect automatically or cor- make explicit these differences. gestures from color anddepth intensity, a measurements robot may to(Fig. use observe thesensing. Most same tactile scene ceive sensors contact, allow but robots dotemperature, to in not per- contrast offer toover, information the the low about resolution human of the skin. robots’does tactile More- not sensors allow robots toriety discriminate across of the va- existing textures,As while the same human data skinto may, does. therefore, both not humansing be a available and robot robots,the may robot’s require successfully sensors a teach- and good their understanding of limitations. 47 they allow the human to moverequire freely. However, they solutions toi. the e., the problem correspondence of how problem, toman transfer to motion from robot hu- when both differ in the kinematic and motion tracking during walkingmotion using of vision. the The humanthe background body using a is model first ofmodel human extracted body. is This from subsequentlythen mapped to to theJapan. an humanoid avatar robot and DBThese at external ATR, meansreturn Kyoto, precise of measurement of tracking the angularment human displace- of the motion limbs and joints. Theyvarious have works been used for in is interested solelytion, in one the maymotion kinematic tracking use of systems, anyon whether the vision, these of mo- exoskeleton, are or themotion based other types sensors. various of The existing wearable and left-hand side of Fig. Robots and Humans 74.3 Interfaces for Demonstration The interface used to provide demonstrationrole plays in a the key way the informationted. is We gathered distinguish and three transmit- major trends: 1. One may directly record human motions. If one

Part G

Part G | 74.3 2000 Learning from Humans 74.3 Interfaces for Demonstration 2001

One main drawback of kinesthetic teaching is that The disadvantage of teleoperation techniques is the human must often use more degrees of free- that the teacher often needs training to learn to dom to move the robot than the number of degrees use the remote control device. Teleoperation us- of freedom moved on the robot. This is visible in ing a simple joystick allows guiding only a subset Fig. 74.4. To move the fingers of one hand of the of degrees of freedom. To control for all degrees robot, the teacher must use both hands. This lim- of freedom, very complex, exoskeleton type of de- its the type of tasks that can be taught through vices must be used, which can be cumbersome. kinesthetic teaching. Typically tasks that would re- Moreover, teleoperation prevents the teacher from quire moving both hands simultaneously could not observing all sensorial information required to per- be taught this way. One could either proceed incre- form the task. For instance, teleoperation, even mentally, teaching first the task for the right hand when using haptic device, poorly renders the con- and then, while the robot replays the motion with tacts perceived at the robot’s end-effector. To pal- its right hand, teach the motion of the left hand. liate to this, one may provide the teacher with However, this may prove to be cumbersome. The visualization interfaces to simulate the interaction use of external trackers as reviewed above are more forces. amenable to teaching coordinated motion between 4. Lastly, one can use explicit information, such as that several limbs. conveyed by speech, to provide additional advice 3. Third, there are immersive teleoperation scenarios, and comments to the demonstration [74.18, 57, 58] where a human operator is limited to using the and VIDEO 103 . Speech is a very natural means robot’s own sensors and effectors to perform the of communication among humans and, hence, is task. Teleoperation may be done using simple joy- viewed as an easy way to allow the end-user to com- sticks or other remote control devices, including municate with robots. However, it necessitates that haptic devices (Fig. 74.4 bottom and VIDEO 101 ). vocabulary that is understandable to the robot and The later have the advantage that they can allow the teacher to teach tasks that require precise control a) of forces, while joysticks would only provide kine- matic information (position, speed). Teleoperation is advantageous compared to exter- nal motion tracking systems, as this solves the correspondence problem entirely, since the system directly records the perception and action from the robot’s configuration space. It is also advanta- geous compared to kinesthetic training, as it allows training the robots from a distance and is, hence, particularly suited for teaching navigation and lo- comotion patterns. The teacher no longer needs to share the same space with the robot. Teleopera- tion is, usually, used to transmit the kinematics of motion. For instance, in [74.51], the acrobatic tra- jectories of a helicopter are learned by recording the motion of the helicopter when teleoperated by an expert pilot. In [74.52], a robot dog is taught to play b) c) soccer by a human guiding it via a joystick. How- ever, in recent work, teleoperation has been used successfully to teach a balancing atG|74.3 | G Part techniques [74.53]. Learning to react to perturba- tions is done through a haptic interface attached to the torso of the demonstrator, which measures the interaction forces when the human is pushed around. The kinematics of motion of the demon- strator are directly transmitted to the robot through Fig. 74.4 (a) Demonstration by visual tracking of gestures (af- teleoperation and are combined with haptic infor- ter [74.54], VIDEO 98 and VIDEO 99 ). (b) Demonstra- mation to train a model of motion conditioned on tion by kinesthetic teaching (after [74.55]and VIDEO 104 ). (c) perceived forces. Demonstration by teleoperation (after [74.56]and VIDEO 101 ) ]. 61 ], or to ]. ]and 21 50 59 ]. Examples of encodes human 60 LfD ) ]. Probabilistic encoding of 61 learning [74. VIDEO 102 ], discrete motion [74. ]. When learning in batch mode, motion in a subspacedimensionality of (after reduced [74. Fig. 74.5 ]. The encoding may be specific to 62 , one-shot 65 – 50 , 63 12 Teaching can also proceed in batch mode af- Choosing properly the variables to encode a par- Each teaching interface has its pros and cons. It cyclic motion [74.22 a combination of both [74. movements in either jointspace space, task [74. space, or torque learning locomotion patterns canTo be make found in sure [74. and that play, this the controller isedge is in not the provided form akin with ofthen primitive to prior consists motion of patterns. knowl- instantiating simple the Learning parameters modulating record these motion patterns. ter recordingtally several by demonstrations, adding recursivelytrial or more [74. incremen- information triallearning by considers allby examples comparing and the drawsence is inference individual usually demonstrations.the based Infer- on demonstration a signalsbility statistical are density analysis, modeled function, where regression via exploiting techniques a various stemming from proba- non-linear ing. Popular machine methods these learn- dayscesses, include Gaussian Gaussian pro- mixture Models,machines. and support vector ticular movement is crucial,of as the it already solutionimportant implies to to part the imitate. problem Work of in defining what is is thus interestingcould to be investigate used in how conjunctioninformation to these provided exploit interfaces complementary by each modality [74. observation of a singleone instance calls of this the motion/action, can LfD uses solely can be com- LfD LfD reby the robot becomes an , a method by which the 74.1) could be taught separately instead of reinforcement learning grounded in the actions and perceptions ofbe the defined robot beforehand. While thisto restricts discrete teaching state–action pairs, it isfor particularly symbolic useful reasoning. While the majority of work in Robots and Humans 74.4.1 Learning Individual Motions Individual motions/actions (e.g.,trashing it, juicing and pouring liquid an intople the shown cup in orange, in the exam- simultaneously, as shown in this previous example.human The teacher would thenples provide of one each or more submotion. exam- If learning proceeds from the 74.4 Algorithms to Learn from Humans Current approaches to encoding skillsbe through broadly divided intosentation two of trends: the a skill,mapping low-level between taking repre- sensory the and motor forma information, of and, high-level a non-linear representationposes of the the skillunits. into skill a that sequence decom- of action-perception robot learns through trial and errorreward. to Other maximize works take a inspiration given in theteach way humans each other andtional introduce teaching interactive scenarios and whe bidirec- active partner during the teachingview phase. the main We briefly principles underlying re- eachbelow: of these areas the demonstrations for learning,works a develops growing methods number of bybined which with otherwork learning investigates techniques. how Onewith to group combine of imitation learning

Part G

Part G | 74.4 2002 Learning from Humans 74.4 Algorithms to Learn from Humans 2003

Encoding often encompasses the use of dimension- task [74.79, 80] or through reinforcement learn- ality reduction techniques that project the recorded ing [74.81]. However, this approach assumes that signals into a latent space of motion of reduced dimen- there is a known set of all necessary primitive ac- sionality. These techniques may either perform a local tions. For specific tasks this may be true, but to date linear transformations [74.66–68] or exploit global non- there does not exist a database of general purpose linear methods [74.59, 69, 70](Fig.74.5). Additionally, primitive actions, and it is unclear whether the vari- task-specific rating functions [74.71] and simulation- ability of human motion may really be reduced to based optimization [74.72] are investigated to identify a finite list. relevant learning features. 2. The alternative is to observe the human perform- ing the complete task and to automatically segment Teaching Force-Control Tasks the task to extract the primitive actions, which may While most LfD to date work focused on learning then become task-dependent, see e.g., [74.82, 83]. the kinematics of motions by recording the posi- This has the advantage of learning, in one swipe, tion of the end-effector and/or the position of the both the primitive actions and the way they should robot’s joints, more recently, some works have investi- be combined. One issue that arises is that the num- gated transmission of force-based signals through hu- ber of primitive tasks is often unknown, and there man demonstration [74.56, 73–76]. See VIDEO 478 could be multiple possible segmentations that must and VIDEO 479 for examples of kinesthetic teaching be considered [74.52]. of compliant motion. Transmitting information about force is difficult for humans and for robots alike. Force Other examples include learning how to sequence can be sensed only when performing the task our- known behaviors to enable complex navigation tasks selves. Current efforts, hence, seek to develop methods through the imitation of a more knowledgeable robots by which one may embody the robot. This allows hu- or humans [74.9, 84, 85] and learning how to sequence man and robot to simultaneously perceive the forces primitive motions for full body motion in humanoid applied when performing the task. A new exciting line robots [74.25, 33, 86]. of research, hence, leverages on recent advances in the A large body of these works uses a symbolic repre- design of haptic devices and tactile sensing, and on the sentation of both the learning and the encoding of the development of torque and variable impedance actuated task [74.6, 30, 85, 87–91]. This symbolic way of encod- systems to teach force-control tasks through human ing skills may take several forms. One common way is demonstration. to segment and encode the task according to sequences of predefined actions, described symbolically. Encod- 74.4.2 Learning Compound Actions ing and regenerating the sequences of these actions can, however, be done using classical tech- Learning complex tasks, composed of a combination niques, such as HMM, [74.30]. and juxtaposition of individual motions, is the ultimate Often, these actions are encoded in a hierarchical goal of LfD. There are two major ways to proceed to manner. In [74.85], a graph-based approach is used learning of such complex tasks: to generalize an object moving skill, using a wheeled . In this model, each node in the graph 1. One may first learn models of all individual mo- represents a complete behavior and generalization takes tions, using demonstrations of each of these actions place at the level of the topological representation of the individually. In a second stage, one may learn the graph. The latter is updated incrementally. right sequence and combination of these actions References [74.88, 89] follow a similar hierarchical by observing a human performing the whole task. and incremental approach to encode various house- This approach, however, assumes that one can list hold tasks (such as setting the table and putting dishes all necessary individual actions, so-called primitive in a dishwasher) (Fig. 74.6 and VIDEO 103 ). There, atG|74.4 | G Part actions. To date, there does not exist a database of learning consists in identifying a sequence of prede- such primitive actions and one may wonder whether fined, elementary actions, which is further combined the variability of human motion may really be re- into a hierarchical task network. By analyzing multi- duced to a finite list of possible motions. A common ple demonstrations, the ordering of elementary actions approach is to first learn models of all of the individ- is learned, resulting in a precedence graph. The prece- ual motions, using demonstrations of each of these dence graph defines a partial ordering on the set of actions individually [74.77, 78], and then learn the learned elementary actions, which can be exploited to right sequencing/combination in a second stage ei- execute elementary actions in parallel, extracting sym- ther by observing a human performing the whole bolic rules that manage the way each object must be ) ]) 2-D 88 Initial ) c ( Final task ) d Precedence ( ) task. b ( ) Training center with ) a ( , so that the robot can start VIDEO 103 setting the table to be amenable to lay users, learn- ] took a symbolic approach to en- dedicated sensors. Fig. 74.6 task precedence graph forthree the demonstrations. first precedence graph after observing additional examples (after [74. ( graphs learned by thethe system for LfD 91 Finally, [74. The main advantage of these symbolic approaches coding human motions as setspositions, of or pre-defined configurations, postures, consideringels different of lev- granularity for themotion. symbolic This representation a of priori the knowledge isthe then correspondence used problem to through explore severalsetups, simulated including motion in joint spacedisplacements of arm of links and objectsplane. on a two-dimensional ( is that high-levelsymbolic cues) skills can (consistinginteractive be of process. learned However, sequences efficiently becausenature through of of of the an symbolic their encoding,amount the of methods prior knowledge relycues to on and predefine to a the segment large important those efficiently. The statistical approach describedteresting previously is way an to in- features extract of the autonomously task, the and,prior thus important to knowledge avoid putting in tooa much the large system. number of However,valid it demonstrations to inference. requires draw It statistically a layuser is will perform many not demonstrations of thetask. reasonable same Hence, for to assumeing should that require asIdeally, few one demonstrations would as like possible. with some the initial knowledge robot to be bootstrapped 74.4.3 Incremental Teaching Methods Data gloves with attached tactile sensors and magnetic trackers field ). At run Schmidt-Rohr POMDP Goal Goal Movable platform for overview perspective ] exploits also a hierarchical ap- 90 ], the approach was extended to learn- 92 Turn- tiltable Turn- camera head for color and depth image processing High sensitive microphone optimized for speech recognition Turn- tiltable Turn- camera head for color and depth image processing ] use a model of the task with partially ob- 93 Start The approaches reviewed above assume a deter- Reference [74. Start b) a) Robots and Humans handled. In [74. et al. [74. servable Markov decision processes ( ing subsymbolic goaleach and constraint elementary descriptions action.robot for applies In motion the planning toreach execution generate the a phase, goals motion while the sulting to task obeying description the mimics the constraints. strategyfollow that The when humans performing re- the task. Based onbolic the goal subsym- and constraint descriptions, the robotson can rea- to adapt the strategyobstacle to occurrence, and changes varying in start object configurations. location, ministic world, whereperception actions of the unfold current uniquelyrobots state operating of from in the real world.world environments However, will using observe imperfect the sensorsactions and may be the stochastic. To account effects for of theity stochastic- of their the robot’s perceptions and actions, time, an optimal (in a maximumsion likelihood is sense) then deci- taken. proach to encodingbehaviors. The a skill consists in skill movingwhere through in a a maze terms wheeledobstacles of robot and reach must pre-defined a set avoidticularity of of several specific this subgoals. kinds The approachrepresentations of par- lies of the in skill, which the arethe use applied to role of explore of symbolic theof teacher the in robot. guiding incremental learning

Part G

Part G | 74.4 2004 Learning from Humans 74.4 Algorithms to Learn from Humans 2005 right away to perform the task, and human training while still meeting the goals of the task ([74.98]for would be used solely to help the robot gradually im- details). Task precedence graphs are directed, acyclic prove its performance. graphs that contain a temporal precedence relation that Incremental learning approaches that gradually re- can be learned incrementally. Incremental learning of fine task knowledge as more examples become avail- task precedence graphs leads to a more general and flex- able pave the way towards LfD systems suitable for ible representation of the task knowledge (Fig. 74.6 and such continuous and long-life robot learning. Fig- VIDEO 105 ). ure 74.7 and VIDEO 104 shows an example of such incremental teaching of a simple skill. 74.4.4 Combining Learning from Humans These incremental learning methods use various with Other Learning Techniques forms of deixis, as well as verbal and non-verbal inter- actions, to guide the robot’s attention to the important To recall, a main argument for the development of parts of the demonstration or to particular mistakes pro- LfD methods was that they would speed up learning by duced by the robot during the reproduction of the task. providing examples of good solutions. This assumption, Such incremental and guided learning is often referred however, is realistic only if the context for the reproduc- to as scaffolding or molding of the robot’s knowledge, tion is sufficiently similar to that of the demonstration. and is key to teaching robots tasks of increasing com- We saw previously that the use of dynamical systems- plexity [74.90, 94]. based representation at the trajectory level allows the Research on the use of incremental learning tech- robot to depart to some extent from a learned trajectory niques for robot LfD has contributed to the development to reach the target, even when both the object and the of methods for learning complex tasks within the house- hand of the robot have moved from the location shown hold domain from as few demonstrations as possible. during the demonstration. There are, however, situa- Moreover, it has contributed to the development and tions in which such an approach would fail, such as, for application of machine learning that allow a continu- instance, when placing a large obstacle in the robot’s ous and incremental refinement of the task model. Such pathway (Fig. 74.8). Besides, robots and humans may systems have sometimes been referred to as background differ significantly in their kinematics and dynamics of knowledge-based or EM deductive LfD-systems,aspre- motion and, although there are varieties of ways to by- sented in [74.95, 96]. They usually require very few or pass the so-called correspondence problem, relearning even only a single user demonstration to generate ex- a new model may still be required in special cases. ecutable task descriptions. The main objective of this To allow the robot to relearn to perform a task in any line of research is to build a meta-representation of the new situation, it appeared important to combine LfD knowledge that the robot has acquired on the task and methods with other motor learning techniques. Rein- to apply reasoning methods on this knowledge database forcement learning (RL) appeared particularly suitable (Fig. 74.6). In this scenario, reasoning involves recog- for type of problem. Indeed, imitation learning is lim- nizing, learning, and representing repetitive tasks. iting in that it requires the robot to learn only from Pardowitz et al. [74.97] discuss how different forms what has been demonstrated. Reinforcement learning, of knowledge can be balanced in an incremental learn- in contrast, allows the robot to discover new control ing system. The system relies on building task prece- policies through free exploration of the state-action dence graphs. Task precedence graphs encode hypothe- space. Approaches that combine imitation learning and ses that the system makes on the sequential structure reinforcement learning aim at exploiting the strength of a task. Learning of the task precedence graphs al- of both algorithms to overcome their respective draw- lows the system to schedule its operations most flexibly, backs. Demonstrations are used to guide the exploration

Task demonstration Learner replay Tactile correction atG|74.4 | G Part

Fig. 74.7 An incremental learning strategy where a manipulation skill is first demonstrated through the use of a data glove. After a first reproduction trial, the skill is refined through kinesthetic teaching, by exploiting the tactile capabilities of the iCub humanoid robot (after [74.50]) ( VIDEO 104 ) ) works ]andthe IRL ]. 117 relies on suc- VIDEO 105 118 problems. Other to make the trans- LfD ]. Learning then pro- LfD ). Note that demonstra- ]and that this multiplicity of ]. This allows the robots 122 , Illustration of the use of 99 addresses the questions of 120 uboptimal) ways to perform works is the assumption of , 121 imitate. It offers an interesting how to imitate VIDEO 477 LfD 119 IRL and Fig. 74.8 reinforcement learning in policy parameter space to refineinitially a learned skill from demonstration (after [74. and what not to of Human–Robot Interaction and VIDEO 476 what to imitate Underlying all The vast majority of work on fer of skill more efficient is to focus on the interaction 74.4.5 Learning from Humans, a Form Another perspective adopted by the approaches to estimating the reward or costtomatically function au- have been proposed,maximum see, margin for planning technique instance, [74. the automatic extraction of constraints [74. tions are neverfailed completely demonstration incorrect. thenparts attempts Learning of to from discover thewere which demonstrations incorrect, were so correctparts. as and In to this which improve context, solely the incorrect what to a consistent reward function. When demonstrationsprovided are by multiple experts,perts this optimize assumes the same that objectives. This alland is ex- constraining does not exploithumans the may variability solve of the ways same in task. which Recent alternative to approaches that combineing and imitation reinforcement learn- learning, into that be no explicitly reward determined. needs consider multiple experts and identify multiplereward different functions [74. cessful demonstrations of theman. desired It task hence by assumesgood that the demonstrations all hu- and the discards demonstrationsproxy those are of that what would are betion. poor deemed Recent as work a has goodthat demonstra- also demonstrations investigated may the possibility insteadperforming the be task failed [74. attemptsceeds at from observing( solely incorrect demonstrations to learn multiple (albeitthe s same task.policies The will make hope thealternative is controller ways to more complete robust, the offering task,no when longer the context allows theoptimal robot way. to perform the task in the is RL and imita- IRL)offers show two ex- solves jointly RL ], and learning ]. 104 74.8 IRL ], navigation strate- ], or to estimate the started in the 1990s 115 ]. When using human ]. ]. More recent efforts 113 ). This, hence, reduces 109 RL 26 , 116 ]. Finally, 105 RL ]. In the latter case, 101 111 to improve the robot’s perfor- , algorithms to find an adequate using ]. 107 [74. , 110 ], who tackled robust control of the RL LfD 112 ], or to generate an initial set of LfD 103 RL , 103 108 – 81 ] and learning industrial tasks like peg- – . They may be used as initial roll-outs ], or sharing a common vocabulary to name 101 100 106 RL 114 Early work on Demonstrations can be used in different ways to Another way to enable the robot to learn a control Variants on Reinforcement Learning Robots and Humans amples of techniques thatin conjunction use with reinforcement learning mance beyond that of a demonstrator. in reinforcement learning ( control policy, while allowing thethe robot to demonstrated behavior. depart Figures from the time it takes for with learning to swingdulum up [74. and control an inverse pen- in-hole with a robot arm [74. include [74. upper body of humanoid robots in varioustasks, manipulation learning an archery skill [74. from which anputed initial [74. estimate of the policy is com- how to hit a snooker ball [74. bootstrap primitives [74. then used to learn howDemonstrations to select can across these also primitives. space be covered used by toreward function limit [74. the search tion learning can beby letting used the in demonstrator take conjunctionduring over one at part trial of run [74. the time, control strategy through a combination of self-experimentation and learning from watchinglation others of is agents tolutionary that evolve approach popu- mimic using each geneticinvestigated other. algorithms by Such has a an been ing number evo- of of manipulation authors, skills e.g., [74. for learn- gies [74. sensoriperception and actions [74. While most of the worksing that with combine reinforcement imitation learning learn- be assume known, inverse the reinforcement rewarda learning to framework ( to determine automaticallythe the optimal reward control and policy [74. demonstrations to guide learning,

Part G

Part G | 74.4 2006 Learning from Humans 74.4 Algorithms to Learn from Humans 2007 aspect of the transfer process. As this transfer problem Recent lines of research in interactive LfD seeks is complex and involves a combination of social mech- to give a more active role to the teacher in a bidi- anisms, several insights from human–robot interaction rectional teaching process [74.15, 137, 138]. Robots (HRI) were explored to make efficient use of the teach- become more active partners and can indicate which ing capabilities of the human user, [74.123–125]for portion of the demonstration was unclear. Teachers may surveys. Next, we briefly survey some of these works. in turn refine the robot’s knowledge by providing com- The development of algorithms for detecting social plementary information where the robot is performing cues given implicitly or explicitly by the teacher dur- poorly. This supplementary information may consist of ing training and the integration of those as part of other additional rounds of demonstrations of the complete generic mechanisms for LfD has become the focus of task [74.139], or may be limited to subparts of the a large body of work in LfD. Such social cues can task [74.140, 141]. The information can be conveyed be viewed as a way to introduce priors in a statistical through specific task’s features, such as a list of way- learning system, and, by so doing, to speed up learning. points [74.142]. The robot is then left free to interpolate Indeed, several hints can be used to transfer a skill not a trajectory using these key points. only by demonstrating the task multiple times but also The design of such incremental teaching methods by highlighting the important components of the skill. calls for machine learning techniques that enable the This can be achieved by various means, using different incorporation of new data in a robust manner. It also modalities. opens the door to the design of other human–robot inter- A large body of work explored the use of point- facing systems, including the use of speech, which leads ing and gazing (Fig. 74.9 left and VIDEO 106 )as to meaningful dialogs between humans and robots. An a way of conveying the intention of the user [74.79, example of such bidirectional teaching is given on the 126–132]. Vocal deixis, using a standard speech recog- right-hand side of Fig. 74.10. The robot asks for help nition engine, has also been explored widely [74.79, during or after teaching, verifying that its understanding 133]. In [74.88], the user makes vocal comments to of the task is correct [74.14]. This teaching interaction highlight the steps of the teaching that are deemed as is tailored to let the user become an active participant being the most important. In [74.134, 135], only the in the learning process (and not only a model of expert prosody of the speech pattern is looked at, rather than behavior). the exact content of the speech, as a way to infer some By taking inspiration from the human tutelage information on the user’s communicative intent. paradigm, [74.15] shows that a socially guided ap- In [74.136], these social cues are learned through proach can improve both the human–robot interaction an imitative game, whereby the user imitates the robot. and the machine learning process by taking into ac- This allows the robot to build a user-specific model of count human benevolence. That work highlights the these social pointers, and, hence be more robust to de- role of the teacher in organizing the skill into manage- tecting those. able steps and maintaining an accurate mental model

Fig. 74.9 Illustration of the use of social cues to speed up the imitation learning process. Here, gazing and pointing 74.4 | G Part information are used to select probabilistically the objects relevant for the manipulation skill ( VIDEO 106 )

Yes it Yes it At the start, Can it be does. can. does orientation this? matter?

Fig. 74.10 Example of an active teaching scenario. The robot asks for help during or after teaching, verifying that its understanding of the task is correct (after [74.14]) ( VIDEO 107 ) ]. Thus, different ]. ] and joint motion 36 152 146 , , the goals of a given ac- ded and robot-initiated 151 145 , , 84 ], recent works, inspired by the 145 , 150 97 – understanding 147 In work to date, teaching is usually done by a single Determining the way humans learn to both ex- This concludes our survey. As the reader can see, Understanding the goal of the task is still only half learning techniques have proven successful infor allowing collaborative improvement ofswitching the learnt between policy human-gui by learning. However, there dodetermine not when yet it exist is best protocolslearning to to modes switch available. between The the various answertask dependent. may, in fact, be teacher, or teachers with anto explicit teach. concept More of work the needsrelated task to be to done to conflicting addresswith issues different demonstrations styles. Similarly, across teachersman are teachers usually beings, hu- butagent. could This instead agent could be be aor an more a knowledgeable computer arbitrary robot simulation. expert Finally, anothertle relatively explored lit- question relatestransfer to the skills problem across of multipleple how robots agents, to (i. including e., teaching multi- isvarious done learner from robots). a Early teacher work in robot this to direction was replication, [74. trajectory following [74. above rationale, start from the assumptionis that imitation not just aboutbut observing rather and about replicating the motion, tion (see the above survey ofautomatically approaches the to reward determining or what to imitate). tract the goalsthese of goals a a setto hierarchy of our of observed understanding preference of actions thecess is and underlying to fundamental give decisional imitation. pro- While wein have that surveyed area, recent work itto is tackling important these to issuesa recall that probabilistic other have approach approaches previously tosequential followed application explain of goals the and derivationlearning apply of and this manipulatory to tasks enable requiringsubsets sequencing of of goals [74. models, modes and communication channels, should be used in conjunction toboth find from a the solutionachieves point what that the of demonstrator is seeks view to optimal teach of the robot. the imitatorthe and issues that of whatnected and and to how a to large imitate extent remain are only tightly partly con- solved. of the picture, as thereing may the be several goal ways(or of of optimal) achiev- for the the task.be demonstrator appropriate Moreover, may for not what the necessarily imitator is [74. feasible is ]use LfD 138 approach to HRI cation, the user provides process, that is, by letting assumes a fixed, given form ]. While a longstanding trend LfD ] provides experiments where 144 90 molding or programming by demonstration ] highlights the importance of an active LfD 143 approached the problem from the standpoint of ) is progressing rapidly, pushing back limits and The combination of reinforcement learning and im- Generally, work in Reference [74. Finally, a core idea of the LfD PbD Robots and Humans ( 74.5 Conclusions and Open IssuesResearch in Robot in LfD of the learner’s understanding. Reference [74. for the robot’s control policy, and learnsrameters. appropriate pa- To date, therepolicies are in several common differentrect usage, forms (or dominant) and of technique. Furthermore, there it isthat possible is a no system clear could berepresentations cor- provided of with controllers multiple and possible appropriate. select which is most itation learning hasaddressing the acquisition been of skills shown thating require of fine the to tun- robot’s dynamics. be Likewise, more interactive effective in posing new questionsof all limitations the and time.complete open As and questions such, is out anylong-standing bound of list limitations to and date. open be However, questionsfurther there in- attention. that are bear a few a similar teachingto paradigm the and learning ofof extend actions continuous the on motion concept objects, trajectoriesa and and humanoid propose experiments robot where first learns observing new a human manipulation demonstratorsensors) skills (through and motion by thenteacher gradually support. refining In its thisscaffolds skill to appli the through robot for the reproductionmoving of kinesthetically the a skill subset by of thethe motors. supervision Through of thetles user who the progressively scaffolds disman- afterrobot each can reproduction finally attempt, reproduceerence the the [74. skill on its own. Ref- participation of thea teacher model of not expertquired only motion behavior through to spoken but feedback. also demonstrate to refine the ac- the robot experience sensory informationing when its explor- environment through the teacher’s support.model Their uses a memory-based approach in whichprovides the labels user for the different componentsto of teach the hierarchically task high-level behaviors. that imitation is goal directed,to that is, fulfill actions a are meant tion specific of purpose the and actorin to [74. convey the inten- a wheeled robot is teleoperatedface through to a simulate screen inter- a

Part G

Part G | 74.5 2008 Learning from Humans References 2009 done in the 1990s [74.115, 153, 154]. This work, how- complex tasks progresses, means to store and reuse ever, has so far been reduced to transfer of navigation or prior knowledge at a large scale will have to be devised. communication skills across swarms of simple mobile Learning stages, akin perhaps to those found in child robots. development, may be required. There will need to be Experiments in LfD have mostly focused on a sin- a formalism to allow the robot to select information, to gle task (or set of closely related tasks), and each reduce redundant information, select features, and store experiment starts with a tabula rasa. As learning of new data efficiently.

Video-References

VIDEO 29 Demonstrations and reproduction of the task of juicing an orange available from http://handbookofrobotics.org/view-chapter/74/videodetails/29 VIDEO 97 Demonstrations and reproduction of moving a chessman available from http://handbookofrobotics.org/view-chapter/74/videodetails/97 VIDEO 98 Full-body motion transfer under kinematic/dynamic disparity available from http://handbookofrobotics.org/view-chapter/74/videodetails/98 VIDEO 99 Demonstration by visual tracking of gestures available from http://handbookofrobotics.org/view-chapter/74/videodetails/99 VIDEO 100 Demonstration by kinesthetic teaching available from http://handbookofrobotics.org/view-chapter/74/videodetails/100 VIDEO 101 Demonstration by teleoperation of humanoid HRP-2 available from http://handbookofrobotics.org/view-chapter/74/videodetails/101 VIDEO 102 Probabilistic encoding of motion in a subspace of reduced dimensionality available from http://handbookofrobotics.org/view-chapter/74/videodetails/102 VIDEO 103 Reproduction of dishwasher unloading task based on task precedence graph available from http://handbookofrobotics.org/view-chapter/74/videodetails/103 VIDEO 104 Incremental learning of finger manipulation with tactile capability available from http://handbookofrobotics.org/view-chapter/74/videodetails/104 VIDEO 105 Policy refinement after demonstration available from http://handbookofrobotics.org/view-chapter/74/videodetails/105 VIDEO 106 Exploitation of social cues to speed up learning available from http://handbookofrobotics.org/view-chapter/74/videodetails/106 VIDEO 107 Active teaching available from http://handbookofrobotics.org/view-chapter/74/videodetails/107 VIDEO 476 Learning from failure I available from http://handbookofrobotics.org/view-chapter/74/videodetails/476 VIDEO 477 Learning from failure II available from http://handbookofrobotics.org/view-chapter/74/videodetails/477 VIDEO 478 Learning compliant motion from human demonstration available from http://handbookofrobotics.org/view-chapter/74/videodetails/478 VIDEO 479 Learning compliant motion from human demonstration II available from http://handbookofrobotics.org/view-chapter/74/videodetails/479

References

74.1 T. Lozano-Perez: Robot programming, Proceed- 74.5 A.M. Segre: Machine Learning of Robot Assembly ings IEEE 71(7), 821–841 (1983) Plans (Kluwer, Boston 1988)

74.2 B. Dufay, J.-C. Latombe: An approach to au- 74.6 S. Muench, J. Kreuziger, M. Kaiser, R. Dillmann: 74 | G Part tomatic robot programming based on inductive Robot programming by demonstration (RPD) - Us- learning, Int. J. Robotics Res. 3(4), 3–20 (1984) ing machine learning and user interaction meth- 74.3 A. Levas, M. Selfridge: A user-friendly high-level ods for the development of easy and comfortable robot teaching system, IEEE Int. Conf. Robotics, robot programming systems, Proc. Int. Symp. In- Altanta (1984) pp. 413–416 dus. Robots (ISIR) (1994) pp. 685–693 74.4 A.B. Segre, G. DeJong: Explanation-based manip- 74.7 A. Billard: Imitation: A review. In: The Handbook ulator learning: Acquisition of planning ability of Brain Theory and Neural Network, 2nd edn., through observation, IEEE Conf. Robotics Autom. ed. by M.A. Arbib (MIT Press, Cambridge 2002) St. Louis (1985) pp. 555–560 pp. 566–569 (7), 27 (3), 323– (2), 286– (5), 943– 19 37 27 , 1193–1200 (2009) 21 Markov chains, Int. J. Robotics Res. (1/2), 155–193 (2001) 761–784 (2008) Gaussian process regression formodel real learning and time control, online Adv.cess. Neural Syst. Inf. Pro- ble non-linear dynamical systems withmixture Gaussian models, IEEE Trans.957 Robotics (2011) helicopters: An algebraic framework forciplinary interdis- studies of imitation and its applications. robot, IEEE Trans. Syst. Man Cybern. B interactive generation of object handlingiors behav- by a smallneural humanoid network robot model, using a Neural dynamic Netw. tion and behavior inductionsymbol based representation of on multimodal geometric sensorimo- tor patterns, IEEE/RSJ Int. Conf. Intell.(IROS) Robots Syst. (2006) pp. 5147–5152 ring robots using neural networks,Conf. Robotics Proc. Autom. IEEE (ICRA) Int. (1993) pp. 339–345 ologically inspired robotic model, J.32 Cybern. Syst. skills fromInt. human Conf. demonstration, Roboticspp. Autom. Proc. 2700–2705 (ICRA), IEEE Vol. 3 (1996) mentary robot skills from humanInt. demonstration, Symp. Intell. Robotics Syst. (SIRS) (1995) pp. 1– skill learning and itsProc. application IEEE in Int. telerobotics, Conf. Roboticspp. Autom. 396–402 (ICRA) (1993) manipulations, Proc. IEEE Int.tom., Conf. Atlanta Robotics (1993) Au- pp. 578–585 acquisition froma human hidden demonstration MarkovRobotics using Autom., model, Minneapolis Proc. (1996) IEEE pp. Int. 2706– Conf. gent extraction of robot trajectory command from demonstrated trajectories, Proc.Ind. IEEE Technol. (ICIT) Int. (1996) pp. Conf. 294–298 gestures for human/robotInt. interfaces, Conf. Proc. Robotics IEEE pp. Autom. 2982–2987 (ICRA), Vol. 4 (1996) learning, clustering andwhole hierarchy body motion formation patterns of usingden adaptive hid- 337 (2006) 38 298 (2007) 2711 74.34 D. Nguyen-Tuong, M. Seeger, J.74.35 Peters: Local S.M. Khansari Zadeh, A. Billard: Learning74.36 sta- C. Nehaniv, K. Dautenhahn: Of hummingbirds and 74.22 M. Ito, K. Noda, Y. Hoshino, J. Tani: Dynamic and 74.23 T. Inamura, N. Kojo, M. Inaba: Situation recogni- 74.24 S. Liu, H. Asada: Teaching and learning of74.25 debur- A. Billard: Learning motor skills by imitation: A bi- 74.26 M. Kaiser, R. Dillmann: Building elementary robot 74.27 R. Dillmann, M. Kaiser, A. Ude: Acquisition of ele- 74.28 W. Yang: Hidden Markov model74.29 approach to P.K. Pook, D.H. Ballard: Recognizing teleoperated 74.30 G.E. Hovland, P. Sikka, B.J. McCarragher: Skill 74.31 S.K. Tso, K.P.Liu: Hidden Markov model for intelli- 74.32 C. Lee, Y. Xu: Online,74.33 interactive learning of D. Kulic, W. Takano, Y. Nakamura: Incremental ,ed. 10(6), 799–822 , ed. by K. Dautenhahn, C. Ne- 1(2), 315–348 (2004) (3), 254–271 (2006) (1), 45–74 (2004) 19 5 Imitation in Animals and Artifacs (3/4), 243–255 (2011) 59 and imitation: A computationally guidedNeural review, Netw. N.A. Mirza,interaction D. dynamics and Francois, engagementchild-robot in L. dyadic interaction Olsson: kinesics:from Sustaining Lessons an learnt exploratoryRobot study, Human IEEEpp. Int. 716–722 Int. Workshop Commun. (ROMAN)ers (2005) that ask good questions,Human-Robot Int. IEEE-ACML (HRI) Int. (2012) Conf. H. Lee, J.Tutelage and Lieberman, collaboration for A. humanoidHuman. robots, Lockerd, Robots D. Chilongo: process featuring predictiveponents: and A learning biologically-plausible computational com- model. In: by C. Nehaniv, K.bridge 2002) Dautenhahn (MIT Press, Cam- communication: First imitations infunctioning infants, children low- with autism andteract. robots, Stud. In- haniv (Cambridge Univ.pp. Press, 361–377 Cambridge 2007) back to scaffold and refineprimitives demonstrated motion on aSyst. mobile robot, Robotics Auton. showing: Generating robotobservation programs of by humanSymp. visual Ind. performance, Robots, Tokyo (1989) Proc. pp. 119–126 Int. by watching:edge Extracting from visual reusable observationmance, IEEE of Trans. Robotics task Autom. human perfor- knowl-(1994) mann: Teaching serviceProgramming by robots demonstation for complex workshophousehold and tasks: environments, Proc.Field Serv. Robotics IEEE (FRS) (2001) Int. Conf. bly task usingConf. a Intell. Robots Syst., dataglove Pittsburgh (1995) pp. system, 1–8 IEEE/RSJ Int. from observation, Partnition I: using Assembly task face-contactobjects), Proc. IEEE recog- relations Int. Conf. Robot. Autom. (polyhedral (ICRA), Vol. 3 (1992) pp. 2171–2177 resenting and generalizing a task in a humanoid pothesis: An interpretation of earlyModels imitation. and In: Mechanisms ofLearning: Imitation Behavioural, and Social Social andtion Communica- Dimensions Robots and Humans 74.8 E. Oztop, M. Kawato, M.A. Arbib: Mirror neurons 74.13 B. Robins, K. Dautenhahn, C.L. Nehaniv, 74.14 M. Cakmak, A.L. Thomaz: Designing robot74.15 learn- C. Breazeal, A. Brooks, J. Gray, G. Hoffman, Kidd, C. 74.9 J. Demiris, G. Hayes: Imitation as a dual-route 74.10 J. Nadel, A. Revel, P. Andry, P. Gaussier: Toward 74.12 B.D. Argall, M. Veloso, B. Browning: Teacher feed- 74.16 Y. Kuniyoshi, M. Inaba, H. Inoue:74.17 Teaching by Y. Kuniyoshi, M. Inaba, H. Inoue:74.18 Learning M. Ehrenmann, O. Rogalla, R. Zöllner, R. Dill- 74.19 C.P. Tung, A.C. Kak: Automatic learning of assem- 74.20 K. Ikeuchi, T. Suchiro: Towards an assembly plan 74.21 S. Calinon, F. Guenter, A. Billard: On learning, rep- 74.11 F. Kaplan, P.-Y. Oudeyer: The progress-drive hy-

Part G

Part G | 74 2010 Learning from Humans References 2011

In: Interdisciplinary Approaches to Robot Learn- 74.51 A. Coates, P. Abbeel, A.Y. Ng: Learning for con- ing,Vol.24,ed.byJ.Demiris,A.Birk(World trol from multiple demonstrations, Proc. 25th Int. Scientific, Singapore 2000) pp. 136–161 Conf. Mach. Learn. (2008) 74.37 C.L. Nehaniv: Nine billion correspondence prob- 74.52 D. Grollman, O.C. Jenkins: Incremental learning of lems and some methods for solving them, Proc. subtasks from unsegmented demonstration, Int. Int. Symp. Imit. Anim. Artifacts (2003) pp. 93–95 Conf. Intell. Robots Syst. (2010) 74.38 P. Bakker, Y. Kuniyoshi: Robot see, robot do: An 74.53 L. Peternel, J. Babic: Humanoid robot posture- overview of robot imitation, AISB Workshop Learn. control learning in real-time based on hu- Robot. Anim., Brighton (1996) man sensorimotor learning ability, IEEE Int. Conf. 74.39 M. Skubic, R.A. Volz: Acquiring robust, force- Robotics Autom. (ICRA) Karlsruhe (2013) based assembly skills from human demonstra- 74.54 A. Ude: Robust estimation of human body kine- tion, IEEE Trans. Robotics Autom. 16(6), 772–781 matics from video, Proc. IEEE/RSJ Int. Conf. Intell. (2000) Robots Syst. (IROS) (1999) pp. 1489–1494 74.40 M. Yeasin, S. Chaudhuri: Toward automatic robot 74.55 B. Akgun, M. Cakmak, K. Jiang, A.L. Thomaz: programming: Learning human skill from visual Keyframe-based learning from demonstration, data, IEEE Trans. Syst. Man Cybern. B 30(1), 180– Int. J. Soc. Robotics 4, 343–355 (2012) 185 (2000) 74.56 P. Evrard, E. Gribovskaya, S. Calinon, A. Bil- 74.41 J. Zhang, B. Rössler: Self-valuing learning and lard, A. Kheddar: Teaching physical collaborative generalization with application in visually guided tasks: Object-lifting case study with a humanoid, grasping of complex objects, Robotics Auton. Syst. Proc. IEEE-RAS Int. Conf. Humanoid Robots (Hu- 47(2/3), 117–127 (2004) manoids), Paris (2009) pp. 399–404 74.42 M. Frank, M. Plaue, H. Rapp, U. Koethe, B. Jaehne, 74.57 C. Chao, M. Cakmak, A.L. Thomaz: Designing in- F.A. Hamprecht: Theoretical and experimen- teractions for robot active learners, IEEE Trans. tal error analysis of continuous-wave time-of- Auton. Mental Dev. 2(2), 108–118 (2010) flight range cameras, Opt. Eng. 48(1), 013602 74.58 S. Calinon, A. Billard: PDA interface for humanoid (2009) robots, Proc. IEEE Int. Conf. Humanoid Robots 74.43 M. Freese, S. Singh, F. Ozaki, N. Matsuhira: Virtual (Humanoids) (2003) robot experimentation platform v-rep: A versatile 74.59 A. Shon, K. Grochow, R. Rao: Robotic imitation 3d robot simulator, Proc. Int. Conf. Simul. Model. from human motion capture using Gaussian pro- Progr. Auton. Robots (SIMPAR) (2010) pp. 51–62 cesses, Proc. IEEE/RAS Int. Conf. Humanoid Robots 74.44 S. Hak, N. Mansard, O. Ramos, L. Saab, O. Stasse: (Humanoids) (2005) Capture, recognition and imitation of anthro- 74.60 Y. Wu, Y. Demiris: Towards one shot learning pomorphic motion, IEEE-RAS Int. Conf. Robotics by imitation for humanoid robots, IEEE-RAS Int. Autom. (2012) pp. 3539–3540 Conf. Robotics Autom. (ICRA) (2010) 74.45 G. Gioioso, G. Salvietti, M. Malvezzi, D. Prat- 74.61 J. Nakanishi, J. Morimoto, G. Endo, G. Cheng, tichizzo: An object-based approach to map hu- S. Schaal, M. Kawato: Learning from man hand synergies onto robotic hands with demonstration and adaptation of biped lo- dissimilar kinematics. In: Robotics - Science and comotion, Robotics Auton. Syst. 47(2/3), 79–91 Systems VIII, ed. by N. Roy, P. Newman, S. Srini- (2004) vasa (MIT Press, Cambridge 2012) pp. 97–105 74.62 D. Lee, C. Ott: Incremental kinesthetic teaching of 74.46 A. Shon, K. Grochow, A. Hertzmann, R. Rao: motion primitives using the motion refinement Learning shared latent structure for image syn- tube, Auton. Robot. 31(2), 115–131 (2011) thesis and robotic imitation, Adv. Neural Inf. Pro- 74.63 A. Ude: Trajectory generation from noisy posi- cess. Syst. (NIPS) 18, 1233–1240 (2006) tions of object features for teaching robot paths, 74.47 A. Ude, C.G. Atkeson, M. Riley: Programming Robotics Auton. Syst. 11(2), 113–127 (1993) full-body movements for humanoid robots by 74.64 J. Yang, Y. Xu, C.S. Chen: Human action learning observation, Robotics Auton. Syst. 47, 93–108 via hidden Markov model, IEEE Trans. Syst. Man (2004) Cybern. A 27(1), 34–44 (1997) 74.48 S. Kim, C. Kim, B. You, S. Oh: Stable whole-body 74.65 K. Yamane, Y. Nakamura: Dynamics filter - con- motion generation for humanoid robots to im- cept and implementation of online motion gen- itate human motions, Proc. IEEE/RSJ Int. Conf. erator for human figures, IEEE Trans. Robotics Intell. Robotics Syst. (IROS) (2009) Autom. 19(3), 421–432 (2003) 74.49 S. Nakaoka, A. Nakazawa, F. Kanehiro, K. Kaneko, 74.66 S. Vijayakumar, S. Schaal: Locally weighted pro- 74 | G Part M. Morisawa, H. Hirukawa, K. Ikeuchi: Learning jection regression: An O(n) algorithm for incre- from observation paradigm: Leg task models for mental real time learning in high dimensional enabling a biped humanoid robot to imitate hu- spaces, Proc. Int. Conf. Mach. Learn. (ICML) (2000) man dances, Int. J. Robotics Res. 26(8), 829–844 pp. 288–293 (2007) 74.67 S. Vijayakumar, A. D’souza, S. Schaal: Incremental 74.50 E.L. Sauser, B.D. Argall, G. Metta, A.G. Billard: online learning in high dimensions, Neural Com- Iterative learning of grasp adaptation through put. 17(12), 2602–2634 (2005) human corrections, Robotics Auton. Syst. 60(1), 74.68 N. Kambhatla: Local Models and Gaussian Mixture 55–71 (2012) Models for Statistical Data Processing, PhD Thesis (4), 33 Learning (2), 322–332 , R. Dillmann: 37 g, R. Dillmann: (2), 163–189 (1996) 23 witz, S. Knoop 7(1), 701–729 (1998) (2), 299–307 (2007) 37 R. Zoellner, S. Knoop, R. Dillmann: (4), 437–448 (2012) 4 , 385–393 (2006) 54 ilistic decision making on a multi-modal , Proc. 2010 IEEE/RSJ Int.Syst. (IROS) Conf. Intell. (2010) Robots A.L. Thomaz: Usingfrom ambiguous perspective demonstrations, Robotics Auton. takingSyst. to learn analysis based on observingby hands vision, and IEEE/RSJ objects Lausanne Int. (2002) Conf. pp. Intell. 1208–1213 Robots Syst. Towards cognitive robots:task representations Building of hierarchical manipulationsman from demonstration, hu- Int. Conf.(ICRA) Barcelona Robotics (2005) Autom. Learning and generalizationfrom unstructured demonstrations, of IEEE Int. complex Conf. Intell. Robotics tasks Syst. (2012) pp. 5239–5246 perception-action loop toA bottom-up imitation approach of processes: Appl. learning Artif. by Intell. imitation, robot task learning: Instructivegeneralization demonstrations, and practice,Auton. Proc. Agents Int. Multiagentpp. Jt. 241–248 Conf. Syst. (AAMAS) (2003) primitives as multiple attractor dynamics:experiment, A IEEE robot Trans. Syst. Man Cybern.481–488 A (2003) M. Sassin: Robot programming by(RPD): demonstration Supporting the induction byaction, human Mach. inter- Learn. Incremental learning of tasksstrations, from past user experiences demon- andIEEE vocal comments, Trans. Syst.(2007) Man Cybern. B multiple human demonstrations, Proc.Symp. IEEE Robot Int. (2006) Human pp. 358–363 Int. Commun. (RO-MAN) ing robots by mouldingthe behavior and environment, scaffolding Proc.Human-Robot ACM Interaction (HRI) SIGCHI/SIGART (2006) pp. Conf. 118–125 Correspondence mapping induced statetion and metrics ac- for roboticMan imitation, Cybern. IEEE B Trans. Syst. K. Alexander, X.of Zhixin planning modelsbased for dexterous on manipulation humanRobotics demonstrations, Int. J. Soc. mann: Programming by demonstration ofb proba- 74.94 C. Breazeal, M. Berlin,74.95 A. Brooks, Y. J. Sato, K. Bernardin, Gray, H. Kimura, K. Ikeuchi: Task 74.96 R. Zoellner, M. Pardo 74.83 S. Niekum, G. Osentoski, A.G. Konidaris, A. Barto: 74.84 P. Gaussier, S. Moga, J.P. Banquet, M. Quoy: From 74.85 M.N. Nicolescu, M.J. Mataric: Natural methods for 74.86 J. Tani, M. Ito: Self-organization of74.87 behavioral H. Friedrich, S. Muench, R. Dillmann, S. Bocionek, 74.88 M. Pardowitz, 74.89 S. Ekvall, D. Kragic: Learning task models74.90 from J. Saunders, C.L. Nehaniv, K. Dautenhahn: Teach- 74.91 A. Alissandrakis, C.L. Nehaniv, K. Dautenhahn: 74.92 J. Rainer, R. Sven, S. Schmidt-Rohr, W. Rühl, 74.93 S.R. Schmidt-Rohr, M. Lösch, R. Jäkel, R. Dill- (3), (3), (1/2), 31 32 36 (5), 581–603 (2011) (2/3), 109–116 (2004) 25 47 (Oregon Graduate Institute ofnology, Portland Science 1996) and Tech- Style-based inverse kinematics,Conf. Proc. Comput. ACM(2004) Int. Gr. pp. 522–531 Interact. Tech. (SIGGRAPH) odic nonlinear principal componentworks neural for net- humanoid motion segmentation,eralization, gen- and generation, Proc. Int.tern Recogn. Conf. (ICPR) Pat- (2004) pp. 537–540 matic selection of tasking, spaces IEEE/RSJ for imitation Int. learn- Conf.(2009) pp. Intell. 4996–5002 Robot. Syst. (IROS) mann: Distributedplanning generalization modelsdemonstration, of IEEE/RSJ in Int. learned Syst. Conf. robot (2011) Intell. Robot. programming by line periodic movement anding force-profile for learn- adaptation to new surfaces,Int. Proc. IEEE-RAS Conf. Human. Robot. (2010) pp. 560–565 learning of positionalstrated via and kinesthetic teaching force andAdv. haptic skills Robotics input, demon- C. Torras:based Learning robot behaviors, Proc. collaborative AAAItell., Conf. Bellevue impedance- Artif. (2013) In- pp. 1422–1428 robots to cooperate with humans innipulation dynamic ma- tasks based onin-the-loop multi-modal human- approach, Auton. Robots 280–298 (2013) mura: Incremental learning ofprimitives full and their body sequencing motion motion through observation, human Int. J. Robotics Res. gramming by demonstrationtasks of for industrial pick-and-place manipulators using taskitives, prim- Int. Symp. Comput. Intell. Robotics(2007) Autom. ing to select andin generalize robot striking table movements tennis, Int. J. Robotics Res. 123–136 (2014) concurrent motorspaces, Proc. skills IEEE Int. in(IROS’2012) Conf. (2012) Robotics Intell. versatile pp. Syst. 3591–3597 solution of simultaneous motor primitives throughtion, imita- IEEE Int. Conf. Dev. Learn. (2011) via observation of human performance,Auton. Robotics Syst. 330–345 (2012) Robots and Humans 74.69 K. Grochow, S.L. Martin, A. Hertzmann, Z. Popovic: 74.70 K.F. MacDorman, R. Chalodhorn, M. Asada: Peri- 74.71 M. Mühlig, M. Gienger, J.J. Steil, C. Goerick: Auto- 74.72 R. Jäkel, P. Meißner, S. Schmidt-Rohr, R. Dill- 74.73 A. Gams, M. Do, A. Ude, T. Asfour, R. Dillmann: On- 74.74 P. Kormushev, S. Calinon, D. Caldwell: Imitation 74.75 L. Rozo, S. Calinon, D.G. Caldwell, P.74.76 Jimenez, L. Peternel, T. Petric, E. Oztop, J. Babic: Teaching 74.82 D. Kulic, C. Ott, C. Lee, J. Ishikawa, Y. Naka- 74.80 A. Skoglund, B. Iliev, B. Kadmiry, R. Palm: Pro- 74.81 K. Muelling, J. Kober, O. Kroemer, J. Peters: Learn- 74.77 C. Daniel, G. Neumann,74.78 J. Peters: Learning O. Mangin, P.-Y. Oudeyer: Unsupervised learning 74.79 R. Dillmann: Teaching and learning of robot tasks

Part G

Part G | 74 2012 Learning from Humans References 2013

74.97 M. Pardowitz, R. Zöllner, R. Dillmann: Incremental robot, Proc. IEEE Int. Conf. Robotics Autom. (ICRA) learning of task sequences with information- (2006) pp. 475–480 theoretic metrics, Proc. Eur. Robotics Symp. (EU- 74.114 B. Jansen, T. Belpaeme: A computational model ROS06) (2005) of intention reading in imitation, Robotics Auton. 74.98 M. Pardowitz, R. Zöllner, R. Dillmann: Learning Syst. 54(5), 394–402 (2006) sequential constraints of tasks from user demon- 74.115 A. Billard, K. Dautenhahn: Grounding communi- strations, Proc. IEEE-RAS Int. Conf. Humanoid cation in autonomous robots: An experimental Robots (HUMANOIDS05) (2005) pp. 424–429 study, Robotics Auton. Syst. 24(1/2), 71–81 (1998) 74.99 S. Calinon, P. Kormushev, D.G. Caldwell: Com- 74.116 P. Abbeel, A. Ng: Apprenticeship learning via in- pliant skills acquisition and multi-optima policy verse reinforcement learning, Int. Conf. Mach. search with EM-based reinforcement learning, Learn. (2004) Robotics Auton. Syst. 61(4), 369–379 (2013) 74.117 N. Ratliff, A.J. Bagnell, M. Zinkevich: Maximum 74.100 C.G. Atkeson, A.W. Moore, S. Schaal: Locally margin planning, Int. Conf. Mach. Learn. (2006) weighted learning for control, Artif. Intell. Rev. 74.118 A. Billard, S. Calinon, F. Guenter: Discriminative 11(1–5), 75–113 (1997) and adaptive imitation in uni-manual and bi- 74.101 J. Peters, S. Vijayakumar, S. Schaal: Reinforcement manual tasks, Robotics Auton. Syst. 54, 370–384 learning for humanoid robotics, Proc. IEEE Int. (2006) Conf. Humanoid Robots (Humanoids) (2003) 74.119 J. Choi, K. Kim: Nonparametric Bayesian inverse 74.102 T. Yoshikai, N. Otake, I. Mizuuchi, M. Inaba, H. In- reinforcement learning for multiple reward func- oue: Development of an imitation behavior in tions, Adv. Neural Inf. Process. Syst. 25, 305–313 humanoid kenta with reinforcement learning al- (2012) gorithm based on the attention during imitation, 74.120 A.K. Tanwani, A. Billard: Transfer in inverse Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS) reinforcement learning for multiple strategies, (2004) pp. 1192–1197 IEEE/RSJ Int. Conf. Intell. Robots Syst. (2013) 74.103 D.C. Bentivegna, C.G. Atkeson, G. Cheng: Learn- 74.121 D.H. Grollman, A. Billard: Donut as i do: Learn- ing tasks from observation and practice, Robotics ing from failed demonstrations, IEEE Int. Conf. Auton. Syst. 47(2/3), 163–169 (2004) Robotics Autom. (2011) 74.104 P. Kormushev, S. Calinon, R. Saegusa, G. Metta: 74.122 A. Rai, G. de Chambrier, A. Billard: Learning Learning the skill of archery by a humanoid from failed demonstrations in unreliable systems, robot iCub, Proc. IEEE Int. Conf. Human. Robots IEEE-RAS Int. Conf. Humanoid Robots (2013) Nashville (2010) 74.123 M. Goodrich, A. Schultz: Human-robot interac- 74.105 P. Pastor, M. Kalakrishnan, S. Chitta, E. Theodorou, tion: A survey, Found. Trend. Human-Comput. S. Schaal: Skill learning and task outcome pre- Int. 1(3), 203–275 (2007) diction for manipulation, IEEE Int. Conf. Robotics 74.124 T. Fong, I. Nourbakhsh, K. Dautenhahn: A survey Autom. (2011) of socially interactive robots, Robotics Auton. Syst. 74.106 J. Kober, J. Peters: Policy search for motor prim- 42(3/4), 143–166 (2003) itives in robotics, Mach. Learn. 84(1/2), 171–203 74.125 C. Breazeal, B. Scassellati: Robots that imitate hu- (2011) mans, Trends Cogn. Sci. 6(11), 481–487 (2002) 74.107 P. Kormushev, S. Calinon, D.G. Caldwell: Robot 74.126 B. Scassellati: Imitation and mechanisms of joint motor skill coordination with EM-based rein- attention: A developmental structure for build- forcement learning, Proc. IEEE/RSJ Int. Conf. Intell. ing social skills on a humanoid robot, Lect. Notes Robots Syst. (IROS) Taipei (2010) pp. 3232–3237 Comput. Sci. 1562, 176–195 (1999) 74.108 N. Jetchev, M. Toussaint: Fast motion planning 74.127 H. Kozima, H. Yano: A robot that learns to com- from experience: Trajectory prediction for speed- municate with human caregivers, Int. Workshop ing up movement generation, Auton. Robots Epigenet. Robotics (2001) 34(1/2), 111–127 (2013) 74.128 H. Ishiguro, T. Ono, M. Imai, T. Kanda: De- 74.109 F. Guenter, M. Hersch, S. Calinon, A. Billard: Re- velopment of an interactive humanoid robot inforcement learning for imitating constrained Robovie – An interdisciplinary approach, Springer reaching movements, RSJ Adv. Robotics 21(13), Tracts Adv. Robotics 6, 179–192 (2003) 1521–1544 (2007) 74.129 K. Nickel, R. Stiefelhagen: Pointing gesture recog- 74.110 B.D. Ziebart, A. Mass, A. Bagnell, A.K. Dey: Max- nition based on 3d-tracking of face, hands and imum entropy inverse reinforcement learning, head orientation, Int. Conf. Multimodal Interfaces Proc. AAAI Conf. Artif. Intell. (2008) (ICMI) (2003) pp. 140–146 74 | G Part 74.111 P. Abbeel, A. Coates, A. Ng: Autonomous heli- 74.130 M. Ito, J. Tani: Joint attention between a hu- copter aerobatics through apprenticeship learn- manoid robot and users in imitation game, Int. ing, Int. J. Robotics Res. 29(13), 1608–1639 (2010) Conf. Dev. Learn. (ICDL) (2004) 74.112 S. Ross, G. Gordon, J.A. Bagnell: A reduction of 74.131 V.V. Hafner, F. Kaplan: Learning to interpret point- imitation learning and structured prediction to ing gestures: Experiments with four-legged au- no-regret online learning, Proc. 14th Int. Conf. Ar- tonomous robots, Lect. Notes Comput. Sci. 3575, tif. Intell. Stat. (AISTATS11) (2011) 225–234 (2005) 74.113 Y.K. Hwang, K.J. Choi, D.S. Hong: Self-learning 74.132 C. Breazeal, D. Buchsbaum, J. Gray, D. Gatenby, control of cooperative motion for a humanoid B. Blumberg: Learning from and about others: ,(MIT , 414–418 (5), 409– 54 54 (3), 299–310 19 s, A.P. Shon, R.P.N. Rao: ots (Humanoids) (2006) , 1–16 (2001) 941 (1), 153–164 (2000) (5), 431–442 (2001) 53A 31 (3), 311–322 (2006) 19 (1), 59–66 (1999) Int. Conf. Humanoid Rob pp. 567–574 tion of gesturesJ. in Exp. children Psychol. is goal-directed, Q. imitation andModels and human-robot Mechanisms ofLearning interaction. Imitation in and Robots, Social In: HumansPress, Cambridge and 2006) Animals pp. 407–424 nisms in robots andLearn. humans, Robots, 5th ed. by Eur. V. Workshop Klingspor (1996) pp. 9–16 and approximation fordemonstration, Robotics Auton. robot Syst. programming413 by (2006) action rule learning with aon human partner an based imitationmotor mapping, Robotics faculty Auton. Syst. with(2006) a simple visuo- movements by imitation:logically-inspired Evaluation ofRobotics connectionist a Auton. Syst. bio- architecture, imitation with nonlinearhumanoid dynamical robots, IEEE systems Int. Conf.(ICRA2002) in Robotics (2002) Autom. pp. 1398–1403 hagen, H. Bekkering: Goalsobservation: and means A in computational action Netw. approach, Neural A probabilisticshared model attention, of Neural(2006) gaze Netw. imitation and for on-line learning androbots: control Experiments of autonomous onproto-language learning with of a26 a doll synthetic robot, Ind. Robot ing and communicationtonomous robot via perspective imitation: systems,Man IEEE An Cybern. Trans. A au- 74.144 H. Bekkering, A. Wohlschlaeger, M. Gattis: Imita- 74.145 M. Nicolescu, M.J. Mataric: Task learning through 74.146 J. Demiris, G. Hayes: Imitative learning74.147 mecha- J. Aleotti, S. Caselli: Robust trajectory learning 74.148 M. Ogino, H. Toichi, Y. Yoshikawa, M. Asada: Inter- 74.149 A. Billard, M. Matarić: Learning human74.150 arm A.J. Ijspeert, J. Nakanishi, S. Schaal: Movement 74.151 R.H. Cuijpers, H.T. van Schie, M. Koppen, W. Erl- 74.152 M.W. Hoffman, D.B. Grime 74.153 A. Billard: Drama, a connectionist architecture 74.154 P. Gaussier, S. Moga, J.P. Banquet, J. Nadel: Learn- 1(2), 79–133 H. Ishiguro, ots (Humanoids) (2005) chmarks Human-Robot hinozawa, ence: An embodied com- (1), 83–104 (2002) 12 34,1–25(2009) (3), 441–464 (2007) 8 to recognize and reproduce social cues,Int. Proc. Symp. IEEE Robot Human Int.(2006) Commun. pp. (RO-MAN) 346–351 Towards using imitationcial understanding to of others bootstrap11(1/2), by 31–62 robots, (2005) the Artif. Life so- brun, A. Cheylus,A. Medrano: A. Robot command, Weitzenfeld,teaching interrogation A. and via Martinez, socialInt. Conf. interaction, Humanoid Proc. Rob IEEE-RAS putational model of social referencing, Workshop Toward Soc. Mech.pp. 7–17 Sci. (CogSci) (2005) communicative intent in robot-directedAuton. Robots speech, N. Hagita,gaze to T. interactionSyst. (RSS) partner, Miyamoto: Philadelphia Proc. (2006) Robotics Responsive Sci. robot robot programming by demonstration?benchmarks - Toward forSpec. improved Issue learning, Psychol,Int. Int. Ben Stud. ing through confidence-based autonomy, J. Artif. Intell. Res. icy adaptation, Found. Trend. Robotics (2010) gramming by demonstration, Proc. IEEE Int. Symp. Robot Human Int. Commun. (RO-MAN), Jejupp. (2007) 702–707 ing from demonstration fornavigation, robust IEEE autonomous (2012) Conf. Robot. Autom. ICRA’12 ing: An approachcreate humanoid to robot behaviors, efficiently Proc. IEEE-RAS and intuitively pp. 475–480 ence meets social sci Robots and Humans 74.136 S. Calinon, A. Billard: Teaching a humanoid robot 74.137 Y. Yoshikawa, K. S 74.133 P.F. Dominey, M. Alvarez, B. Gao, M. Jeam- 74.135 C. Breazeal, L. Aryananda: Recognition of affective 74.138 S. Calinon, A. Billard: What is the teacher’s role in 74.139 S. Chernova, M. Veloso: Interactive policy learn- 74.140 B.D. Argall, E.L. Sauser: Tactile guidance for pol- 74.141 S. Calinon, A. Billard: Active teaching in robot pro- 74.142 D. Silver, A. Bagnell, A. Stentz:74.143 Active learn- M. Riley, A. Ude, C. Atkeson, G. Cheng: Coach- 74.134 A.L. Thomaz, M. Berlin, C. Breazeal: Robot sci-

Part G

Part G | 74 2014