Towards a Robot Learning Architecture
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI Technical Report WS-93-06. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. Towards a Robot Learning Architecture Joseph O’Sullivan* School Of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 email: josu][email protected] Abstract continuously improves its performance through learning. I summarize research toward a robot learning Such a robot must be capable of aa autonomous exis- architecture intended to enable a mobile robot tence during which its world knowledge is continuously to learn a wide range of find-and-fetch tasks. refined from experience as well as from teachings. In particular, this paper summarizesrecent re- It is all too easy to assumethat a learning agent has search in the Learning Robots Laboratory at unrealistic initial capabilities such as a prepared environ- Carnegie Mellon University on aspects of robot ment mapor perfect knowledgeof its actions. To ground learning, and our current worktoward integrat- our research, we use s Heath/Zenith Hero 2000 robot ing and extending this within a single archi- (named simply "Hero"), a commerc/aI wheeled mobile tecture. In previous work we developed sys- manipulator with a two finger hand, as a testbed on tems that learn action models for robot ma- whichsuccess or failure is judged. nipulation, learn cost-effective strategies for us- In rids paper, I describe the steps being taken to design ing sensors to approach and classify objects, and implement a learning robot agent. In Design Prin- learn models of sonar sensors for map build- ciples, an outline is stated of the believed requirements ing, learn reactive control strategies via rein- for a successful agent. Thereafter, in Learning Robot forcement learning and compilation of action Results, I summarise previous bodies of work in the models, and explore effectively. Our current ef- laboratory, each of which investigate subsets of our con- forts aim to coalesce these disjoint approaches victions. Our current work drawing together these var- into a single robot learning agent that learns to ious threads into a single learning robot architecture is construct action models in a real-world environ- presented in Cohesive Robot Architecture. In Ap- ment, learns models of visual and sonar sensors proach and Fetch - a Demonstration, I present a for object recognition and learns efficient reac- prototype system which demonstrates that performance tive control strategies via reinforcement learn- can improve through learning. Finally, I summarizethe ing techniques utilizing these models. current research, pointing out limitations and the work sthat. remain 1 Introduction 2 Design Principles The Learning Robots Laboratory at Carnegie Mellon What is required of a learning agent? Mitchell has University focuses on combining perceptual, reasoning argued[Mitchell, 19901 that a learning agent can be un- and learning abilities in autonomousmobile robots. Suc- cessful systems have to cope with incorrect state descrip- derstood in terms of several types of performance met- tions caused by poor sensors, converge with a limited rics: amount of examples since a robot cannot execute roll- ¯ Correctness The prediction of the effects of its ac- lions of trials, interact with the environmentboth for ef- tions in the world must becomeincreasingly better ficient exploration towards new knowledgeand exploita- for completing a task. tion of learned facts, while being robust. ¯ Perceptiveness Increasingly relevant features The ultimate goal of the research undertaken in the which impact its success must be constructed. laboratory can be stated as to achieve a robot that a Reactivity The time required to chose "correct" "This research was sponsored in part by the Avionics actions must becomeincreasingly quicker. Lab, Wright 1~.esearch and DevelopmentCenter, Aeronau- ticsl Systems Division (AFSC),U. S. Air Force, Wright- ¯ Optimality The sequence of actions chosen must Patterson AFB, OH45433-6543 under Contract F33616-90- be increasingly efficient for task completion. C-1466, Arpa Order No. 7697. The views and conclusions containedin this documentare those of the author and should In addition, an agent must be effective, that is, have not be interpreted as representingthe oi~clal policies, either appropriate sensors and actions to be capable of per- expressed or implied, of the U.S. Government. forming tasks. 47 $ Learning Robot Results environment a scalar performance feedback constructed so that maximumreward occurs when the task is com- Individual systems have been developed that explore var- pleted successfully and typically someform of punish- ious facets in the creation of a learning robot agent. ment is presented when then agent fails. The agent must Learning, planning, even simply reacting to events, then maximize the cumulative reinforcements, which is difficult without having reliable knowledge of the corresponds to developing successful strategies for sue. outcomes of actions. The usual human-programming cess at the task. By using artificial neural networks, Lin methodof defining "what actions do" is both inefficient demonstrated that the agent was able to generalize to and often ineffectual. Christiansen analysed conditions unforeseen events and to survive in moderately complex under which robotic systems can learn action models Al- dynamic environments. However, although reinforce- lowing for automated planning and successful execution ment leaxning was more successful than action compi- of strategies[Christiansen, 1992]. He developed systems lation at self-improvement in a real-world domain, con- that generated action models consisting of sets of fun- vergence of learning was typically longer and the plans nels where each funnel mappeda region of task action produced at early stages of learning were dangerous to space to a reduced region of the state space. He demon- the robot. strsted that such funnels can be acquired for continu- Anotherissue in robot learning is the question of when ous tasks using negligible prior knowledge and that a to exploit current plans or to explore further in the hope simple planner was sufficient for then generating plans of discovering hidden shortcuts. Thrun evaluated the when the learning mechanism is robust to noise and impact of exploration knowledge in tabula rasa envi- non-determlnllm and the planner is capable of reasoning ronments, where no a priori knowledge, such as action about the reliabilities associated with each action model. models, is provided, demonstrating the superiority of Once action models have been discovered, sensing to one particular directed exploration rule, counter-based decide which action to take can have varying costs. The exploration[Thrun, 1992]. time it takes a physical sensor to obtain information vm’ies widely from sensor to sensor. Hero’s camera, a 4 A Cohesive Robot Architecture passive device, is an order of magnitudefaster than ae- tive sensing using s wrist mountedsonar. Yet, sonar in- Following the individual lessons learned from each of formstion is more appropriate than vision when ambigu- these approaches, it remains to scale up and coalesce ity exists about the distance to an object. This lead Tan each of these systems into a single robot architecture. In to investigate learning cost-effective strategies for using addition, whereas previously the laboratory robot relied sensors to approach and classify objects[Tan, 1992]. He upon sonar for perception, we require an effective robot developeds cost sensitive learning system called CSLfor agent to include a visual sensor, both for speed of sens- Hero which, given a set of unknownobjects and models ing (reaction times approaching ~ second are possible) of both sensors and actions, learns whereto sense, which and an increase in discrimination capabilities. Anarchi- sensor to use, and which action to apply. tecture is created based upon the following components Learning to model sensors involves capturing knowi- (see figure 1): edge independent of any particular environment that a robot might face while learning typical environments in which the robot is known to operate. Thrun in- LearningAgent vestigated learning such models by combining artifi- )’ ! cial neural networks and local, instance-based learning techniques[Thrun, 1993]. He demonstrated that learn- lsoosorMo,AotionM o, I ,T kComp’ tion Learner learner [ Recognizer ing these models provides an efficient means of knowi- ’ ] edge transfer from previously explored envizonments to new environments. A robot acting in a real world situation must re- RationalPlanner ReinforcementLearner spond quickly to changes in its environment. Twodis- parate approaches have been investigated. B]ythe & Mitchell developed an autonomousrobot agent that ini- Selection Mechanism tially constructed explicit plans to solve problemsin its domain using prior knowledge of action preconditions and postcondltions[Blythe and Mitchell, 1989]. This "Theo-agentn converges to a reactive control strategy by compiling previous plans into stimulus-response rules using explanation based learning[Mitchell, 1990]. The Figure 1: Schema of Robot Architecture The bold agent can then respond directly to features in the en- lines have been implementedin the prototype "approach vironment with an appropriate action by querying this and fetch" system. rule set. Conversely, Lin applied artificial neural network based ¯ An action model learning mechanism which com- reinforcement learning techniques to create reactive con- piles domain independent knowledgerelating