REFLEXIVE COLLISION RESPONSE with VIRTUAL SKIN Roadmap Planning Meets Reinforcement Learning

REFLEXIVE COLLISION RESPONSE WITH VIRTUAL SKIN Roadmap Planning Meets Reinforcement Learning Mikhail Frank, Alexander Forster¨ and Jurgen¨ Schmidhuber Dalle Molle Institute for Artificial Intelligence (IDSIA), CH-6928 Manno-Lugano, Switzerland Facolta` di Scienze Informatiche, Universita` della Svizzera Italiana, CH-6904 Lugano, Switzerland Dipartimento Tecnologie Innovative, Scuola Universitaria Professionale della Svizzera Italiana CH-6928 Manno-Lugano, Switzerland Keywords: Roadmap planning, Reinforcement learning, Collision avoidance, Robotics framework. Abstract: Prevalent approaches to motion synthesis for complex robots offer either the ability to build up knowledge of feasible actions through exploration, or the ability to react to a changing environment, but not both. This work proposes a simple integration of roadmap planning with reflexive collision response, which allows the roadmap representation to be transformed into a Markov Decision Process. Consequently, roadmap planning is extended to changing environments, and the adaptation of the map can be phrased as a reinforcement learning problem. An implementation of the reflexive collision response is provided, such that the reinforcement learning problem can be studied in an applied setting. The feasibility of the software is analyzed in terms of runtime performance, and its functionality is demonstrated on the iCub humanoid robot. 1 INTRODUCTION a way to improve their adaptiveness and exploit the versatility of modern hardware. This will undoubt- Currently available industrial robots are employed to edly require a broad spectrum of behaviors that are do repetitive work in structured environments, and applicable under a variety of environmental con- their highly specialized nature is therefore unprob- straints/configurations. At the highest level, the ver- lematic, or even desirable. However the next gener- satile agent/controller must solve a variety of differ- ation of robot helpers is expected to tackle a much ent problems by identifying relevant constraints and wider variety of applications, working alongside peo- developing or invoking appropriate behaviors. How- ple in homes, schools, hospitals, and offices. The ever we, as engineers and programmers, are not likely hardware exists already. State-of-the-art humanoid to be able to explicitly and accurately predict the robots such as the NASA/GM Robonaut 2 (Diftler wide range of constraints and operating conditions et al., 2011), the Willow Garage Personal Robot 2 that will be encountered in homes, schools, hospi- (http://www.willowgarage.com), the iCub from the tals, and offices, where the next generation of robots Italian Institute of Technology (IIT) (Metta et al., should serve. This is one motivation for the develop- 2008), and Toyota’s Partner Robots (Takagi, 2006; mental approach to robotics, which focuses on sys- Kusuda, 2008) are impressive machines with many tems that adaptively and incrementally build a reper- degrees of freedom (DOFs). Physically speaking, toire of actions and/or behaviors from experience. they should be capable of doing a much wider va- To effectively learn from experience, an riety of jobs than their industrial ancestors. Behav- agent/controller must explore. However given iors however are still programmed manually by ex- the fragility of robotic hardware, exploratory be- perts and the resulting programs are generally engi- havior can be quite hazardous. Self-collision and neered to solve a particular instance (or at best a set of collision with fixed objects in the environment related instances) of a task. Consequently, these ad- usually lead to calibration problems and/or broken vanced robots are endowed with relatively few, highly components, both of which require time-consuming specialized control programs, and their versatility re- maintenance efforts that interrupt experimentation. mains quite limited. For these reasons, implementing exploratory be- In order to realize the potential of modern hu- havior in simulation is an appealing alternative to manoid robots, especially with respect to service in the real world. In fact this is the approach favored unstructured, dynamic environments, we must find in most of the path planning literature, which fo- 642 Frank M., Förster A. and Schmidhuber J.. REFLEXIVE COLLISION RESPONSE WITH VIRTUAL SKIN - Roadmap Planning Meets Reinforcement Learning. DOI: 10.5220/0003883206420651 In Proceedings of the 4th International Conference on Agents and Artificial Intelligence (SSIR-2012), pages 642-651 ISBN: 978-989-8425-95-9 Copyright c 2012 SCITEPRESS (Science and Technology Publications, Lda.) REFLEXIVE COLLISION RESPONSE WITH VIRTUAL SKIN - Roadmap Planning Meets Reinforcement Learning cuses primarily on off-line search for feasible mo- ment is phrased as a reinforcement learning (RL) tions that interpolate a starting configuration and a problem. goal configuration. Many such planning algorithms 3. Topological changes to the roadmap can be han- offer desirable theoretical properties, such as proba- dled within the MDP framework. bilistic completeness. However, the configuration of the workspace/environment must not change between In section 6, we introduce our open source soft- the simulated synthesis and validation of a motion ware framework, which implements the tight cou- and the execution of the motion by hardware. Since pling of planning and reactive control modules. In the static environment assumption does not usually section 7, we conduct a brief feasibility study on the hold in the real world, the practical applicability of implemented software, which was developed in col- most path planning algorithms outside the laboratory laboration with the EU project IM-CLeVeR. Finally, is quite limited. in section 8 we discuss outstanding issues and future Exploratory behavior can of course be imple- work . mented on hardware, provided that there exists an on- line monitoring mechanism to prevent harmful collisions. Such a system must identify infeasible mo- 2 THE MOTION PLANNING tions and revise them in real-time. Run-time perfor- PROBLEM mance is clearly of primary concern, and most on-line collision avoidance approaches therefore revise mo- For an arm (or a humanoid upper body) working in tions according to some heuristics that are computed 3D space, the motion planning problem is formalized locally, in the neighborhood of impending collisions. as follows: The workspace, Fast though it is, re-planning motions according to lo- cal constraints only is indeed short-sighted, and on- W = 3 (1) line collision avoidance systems therefore make rela- R tively poor path planners. contains a robot composed of n links: We propose that an embodied, intelligent agent, n [ capable of adapting to a changing real world envi- A(q) = Ai(q) ⊂ W (2) ronment, requires both the ability to plan with re- i=1 spect to global task constraints and the ability to re- Each link, Ai, is represented by a semi-algebraic act to changes in its environment. Furthermore, it is model. The vector of joint positions our intuition that the planner should learn from the reactions to its plans, which requires a tightly inte- n q 2 C = R (3) grated approach to planning and control. In this work we consider the theoretical implications of coupling denotes the configuration of the robot A, and kine- a roadmap planner to a controller that provides re- matic constraints yield a proper functional mapping flexive reaction to collisions. Furthermore, we intro- q ! A(q). Furthermore, there exists an obstacle re- duce Virtual Skin, our open-source framework, which gion composed of m obstacles: monitors the state of a physical robot in real-time, m [ facilitating applied research on tightly coupled plan- O = Oi ⊂ W (4) ner/controler systems. i=1 The remainder of the paper proceeds as follows: Each obstacle, Oi, is also expressed as a semi- In section 2, the motion planning problem is defined. algebraic model. Prevalent approaches to motion planning and reactive To find feasible motions, we must disambiguate collision avoidance are discussed in sections 3 and 4, the feasible region of the configuration space, Cf ree ⊂ respectively. In section 5 we describe our assump- C, from the infeasible region: tions concerning the motion planning problem for non-static environments, and we propose the integra- Cin f easible = CO [Csel f ⊂ C (5) tion of reactive collision response into the roadmap C denotes the set of configurations q for which A(q) planning approach. The tight coupling of planning O intersects O: and control offers the considerable benefits that: 1. Roadmap planning is extended to non-static envi- CO = fq 2 C j A(q) \ O 6= ?g ⊂ C (6) ronments by transforming the roadmap graph into In equation 5 Csel f denotes the set of configurations q a Markov Decision Process (MDP). for which A(q) is self-intersecting. To handle this, let 2. Adaptation of the map to changes in the environ- P denote a set of pairs of indices ( j;k) 2 P such that 643 ICAART 2012 - International Conference on Agents and Artificial Intelligence j 6= k and j;k 2 f1;2;:::ng, which correspond to the Cin f easible must remain constant during the course of poses of two links A j and Ak that are not allowed to planning. The primary drawback of single query al- collide. As a matter of convenience, indices of links gorithms is their high complexity, which is O(mn), that share a common joint are typically excluded from where m is linear sampling density and n is the dimen- P. The set of self colliding poses can then be ex- sionality of the configuration space. A recent state- pressed: of-the-art algorithm, BT-RRT (Perez et al., 2011), requires 10 seconds to find a feasible solution to a rel- [ atively easy planning problem, wherein two arms (12 Csel f = fq 2 C j A j(q) \ Ak(q) 6= ?g ⊂ C (7) DOF) must circumvent the edge of a table. Moreover, [ j;k]2P the initial solution, the result of stochastic search, is The motion planning problem is essentially to find quite circuitous, and BT-RRT requires an additional 125 seconds to smooth the motion by minimizing a a trajectory T(qinitial;qgoal) ⊂ Cf ree that interpolates initial and goal configurations while not intersecting cost function in the style of optimal control.

Load more