An Integrated Robotic System for Spatial Understanding and Situated

An Integrated Robotic System for Spatial Understanding and Situated Interaction in Indoor Environments Hendrik Zender1 and Patric Jensfelt2 and Oscar´ Mart´ınez Mozos3 and Geert-Jan M. Kruijff1 and Wolfram Burgard3 1Language Technology Lab, German Research Center for Artificial Intelligence (DFKI), Saarbrucken,¨ Germany 2Centre for Autonomous Systems, Royal Institute of Technology, Stockholm, Sweden 3Department of Computer Science, University of Freiburg, Freiburg, Germany 1{zender, gj}@dfki.de, [email protected], 3{omartine, burgard}@informatik.uni-freiburg.de Abstract these questions from a viewpoint of cognitive systems, tak- ing inspiration from AI and cognitive science alike. A major challenge in robotics and artificial intelligence We believe that a true progress in the science of cog- lies in creating robots that are to cooperate with people in human-populated environments, e.g. for domestic as- nitive systems for real world scenarios requires a multi- sistance or elderly care. Such robots need skills that al- disciplinary approach. In this paper we present our expe- low them to interact with the world and the humans liv- riences of integrating a number of state-of-the-art compo- ing and working therein. In this paper we investigate the nents from different disciplines in AI into a mobile robot question of spatial understanding of human-made envi- system. The functionalities of our system comprise percep- ronments. The functionalities of our system comprise tion of the world (place and object recognition, people track- perception of the world, natural language, learning, and ing, mapping, and self-localization), natural language (situ- reasoning. For this purpose we integrate state-of-the-art ated, mixed-initiative spoken dialogue), learning, and finally components from different disciplines in AI, robotics reasoning about places and objects. and cognitive systems into a mobile robot system. The The paper describes the principles we used for the inte- work focuses on the description of the principles we used for the integration, including cross-modal integra- gration of the “CoSy Explorer” system: cross-modal inte- tion, ontology-based mediation, and multiple levels of gration, ontology-based mediation, and multiple levels of abstraction of perception. Finally, we present experi- abstraction of perception to move between quantitative and ments with the integrated “CoSy Explorer”1 system and qualitative representations of differing granularity. list some of the major lessons that were learned from its design, implementation, and evaluation. Related Work There are several approaches that integrate different tech- Introduction niques in mobile robots that interact in populated environments. Rhino (Burgard et al. 2000) and Robox (Siegwart et Robots are gradually moving out of the factories and into al. 2003) are robots that work as tour-guides in museums. our homes and offices, for example as domestic assistants. Both robots rely on an accurate metric representation of the Through this development robots will increasingly be used environment and use limited dialogue to communicate with by people with little or no formal training in robotics. Com- people. Also (Theobalt et al. 2002) and (Bos, Klein, & Oka munication and interaction between robots and humans be- 2003) present a mobile robot, Godot, endowed with natural come key issues for these systems. language dialogue capabilities. They do not only focus on A cornerstone for robotic assistants is their understand- navigation, but rather propose a natural language interface ing of the space they are to be operating in: an environment for their robot. The main difference to our approach is that built by people for people to live and work in. The research they do not capture the semantic aspects of a spatial entity. questions we are interested in concern spatial understand- Other works use integration of different modalities to ing, and its connection to acting and interacting in indoor- obtain a more complete representation of the environment environments. Comparing the way robots typically perceive where the robot acts. (Galindo et al. 2005) present a map- and represent the world with the findings from cognitive psy- ping approach containing two parallel hierarchies, spatial chology about how humans do it, it is evident that there is and conceptual, connected through anchoring. For acquiring a large discrepancy. If robots are to understand humans and the map the robot is tele-operated, as opposed to our method vice versa, robots need to make use of the same concepts to that relies on an extended notion of human-augmented map- refer to things and phenomena as a person would do. Bridg- ping. Other commands are given as symbolic task descrip- ing the gap between human and robot spatial representations tions for the built-in AI planner, whereas in our system the is thus of paramount importance. Our approach addresses communication with the robot is entirely based on natural Copyright c 2007, Association for the Advancement of Artificial language dialogue. Intelligence (www.aaai.org). All rights reserved. Robotic applications using the (Hybrid) Spatial Semantic 1http://www.cognitivesystems.org Hierarchy (Beeson et al. 2007; MacMahon, Stankiewicz, & 1584 Multi-Layered Conceptual Spatial Map conceptual map topological navigation recognized metric map map objects feature map Area Mapping Room Corridor 2 Office Kitchen 3 Reasoning 1 4 1 2 3 4 Communication Subsystem Navigation Subsystem BDI mediation object recognition interpretation dialogue planning parsing realization syntax SLAM recognitionspeech synthesis Place People (( (( classifier tracker ( ( laser readings odometry Figure 1: The information processing in the integrated “CoSy Explorer” system. Kuipers 2006) also use different modalities for the integra- The communication between the user and the robot supports tion of multiple representations of spatial knowledge. These mixed initiative: either the user explains some concepts to approaches are particularly well-suited to ground linguis- the robot, or it is the robot that poses questions to the user. tic expressions and reasoning about spatial organization in The complete system was implemented and integrated on route descriptions. Compared to our implementation these an ActivMedia PeopleBot mobile platform. The robot is approaches do not exhibit an equally high level of integra- equipped with a SICK laser range finder with a 180o field tion of the different perception and (inter-)action modalities. of view mounted at a height of 30 cm, which is used for Finally, the robot Biron is endowed with a system that in- the metric map creation, for people following, and for the tegrates spoken dialogue and visual localization capabilities semantic classification of places. Additionally, the robot on a robotic platform similar to ours (Spexard et al. 2006). is equipped with a camera for object detection, which is This system differs from ours in the individual techniques mounted on a pan-tilt unit (PTU). chosen and in the degree to which conceptual spatial knowl- Just like in human-human communication where spoken edge and linguistic meaning are grounded in, and contribute language is the main modality, the user can talk to the robot to, the robot’s situation awareness. using a bluetooth headset and the robot replies using a set of speakers. The on-board computer runs the Player soft- System Integration Overview ware for control and access to the hardware, and the Festival In this section we give an overview of the different subsys- speech synthesizer. The rest of the system, including the tems that our approach integrates. These subsystems will be Nuance speech recognition software, runs on off-board ma- explained in more detail in successive sections. chines that are interconnected using a wireless network. Fig. 1 sketches the connections between the different modalities implemented in our robot. The robot acquires Perception information about the environment using different sensors. This information is used for object recognition, place clas- Mapping and Localization To reach a high level of au- sification, mapping and people tracking. All these percep- tonomy the robot needs the ability to build a map of the envi- tion components are part of the navigation subsystem, which ronment that can be used to navigate and stay localized. To uses the sensors for self-localization and motion planning. this end we use a feature-based Simultaneous Localization The information is then used to create a multi-layered And Mapping (SLAM) technique. The geometric primitives conceptual and spatial representation of the man-made en- consist of lines extracted from laser range scans. The math- vironment the robot is acting in. Some of the informa- ematical framework for integrating feature measurements is tion needed at the conceptual level to complete this rep- the Extended Kalman Filter. The implementation is based resentation is given by the user through spoken dialogue. on (Folkesson, Jensfelt, & Christensen 2005). 1585 Object Recognition A fundamental capability for a cog- of a user-driven supervised map acquisition process with au- nitive system interacting with humans is the ability to rec- tonomous exploration by the robot. This process is based on ognize objects. We use an appearance based method. Each the notion of Human-Augmented Mapping (Topp & Chris- object is modeled with a set of highly discriminative image tensen 2005). In our implementation, the map acquisition features (SIFT) (Lowe

An Integrated Robotic System for Spatial Understanding and Situated

Neuron C Reference Guide Iii • Introduction to the LONWORKS Platform (078-0391-01A)

Navigation Techniques in Augmented and Mixed Reality: Crossing the Virtuality Continuum

Parallel Range, Segment and Rectangle Queries with Augmented Maps

Handwritten Digit Classication Using 8-Bit Floating Point Based Convolutional Neural Networks

High Dynamic Range Video

Supporting Technology for Augmented Reality Game-Based Learning

On the Automated Derivation of Domain-Specific UML Profiles

PAM: Parallel Augmented Maps

Distributed Cognitions GENERAL EDITORS: ROY PEA Psychological and Educational JOHN SEELY BROWN Considerations

Towards Bendable Augmented Maps

1.3. Thesis Guide

Map Generation from Large Scale Incomplete and Inaccurate Data