REFLEXIVE COLLISION RESPONSE with VIRTUAL SKIN Roadmap Planning Meets Reinforcement Learning

REFLEXIVE COLLISION RESPONSE WITH VIRTUAL SKIN Roadmap Planning Meets Reinforcement Learning Mikhail Frank, Alexander Forster,¨ and Jurgen¨ Schmidhuber Dalle Molle Institute for Artificial Intelligence (IDSIA), CH-6928 Manno-Lugano, Switzerland Faculty of Informatics, Universita` della Svizzera italiana, CH-6904 Lugano Dipartimento tecnologie innovative, Scuola universitaria professionale della Svizzera italiana, CH-6928 Manno-Lugano Keywords: Roadmap Planning: Reinforcement Learning: Collision Avoidance: Robotics Framework Abstract: Prevalent approaches to motion synthesis for complex robots offer either the ability to build up knowledge of feasible actions through exploration, or the ability to react to a changing environment, but not both. This work proposes a simple integration of roadmap planning with reflexive collision response, which allows the roadmap representation to be transformed into a Markov Decision Process. Consequently, roadmap planning is extended to changing environments, and the adaptation of the map can be phrased as a reinforcement learning problem. An implementation of the reflexive collision response is provided, such that the reinforcement learning problem can be studied in an applied setting. The feasibility of the software is analyzed in terms of runtime performance, and its functionality is demonstrated on the iCub humanoid robot. 1 INTRODUCTION versatility of modern hardware. This will undoubt- edly require a broad spectrum of behaviors that are Currently available industrial robots are employed to applicable under a variety of environmental con- do repetitive work in structured environments, and straints/configurations. At the highest level, the ver- their highly specialized nature is therefore unprob- satile agent/controller must solve a variety of differ- lematic, or even desirable. However the next gener- ent problems by identifying relevant constraints and ation of robot helpers is expected to tackle a much developing or invoking appropriate behaviors. How- wider variety of applications, working alongside peo- ever we, as engineers and programmers, are not likely ple in homes, schools, hospitals, and offices. The to be able to explicitly and accurately predict the hardware exists already. State-of-the-art humanoid wide range of constraints and operating conditions robots such as the NASA/GM Robonaut 2 (Diftler that will be encountered in homes, schools, hospi- et al., 2011), the Willow Garage Personal Robot 2 tals, and offices, where the next generation of robots (http://www.willowgarage.com), the iCub from the should serve. This is one motivation for the develop- Italian Institute of Technology (IIT) (Metta et al., mental approach to robotics, which focuses on sys- 2008), and Toyota’s Partner Robots (Takagi, 2006; tems that adaptively and incrementally build a reper- Kusuda, 2008) are impressive machines with many toire of actions and/or behaviors from experience. degrees of freedom (DOFs). Physically speaking, To effectively learn from experience, an they should be capable of doing a much wider va- agent/controller must explore. However given riety of jobs than their industrial ancestors. Behav- the fragility of robotic hardware, exploratory be- iors however are still programmed manually by ex- havior can be quite hazardous. Self-collision and perts and the resulting programs are generally engi- collision with fixed objects in the environment neered to solve a particular instance (or at best a set of usually lead to calibration problems and/or broken related instances) of a task. Consequently, these ad- components, both of which require time-consuming vanced robots are endowed with relatively few, highly maintenance efforts that interrupt experimentation. specialized control programs, and their versatility re- Therefore, if exploration is to be implemented using mains quite limited. physical hardware directly, an online monitoring In order to realize the potential of modern hu- mechanism is required to prevent harmful collisions. manoid robots, especially with respect to service in It is worthy of note that many biological organisms unstructured, dynamic environments, we must find are equipped with exactly such a system. Skin facil- a way to improve their adaptiveness and exploit the itates collision detection, and reflexes help resolve dangerous situations, allowing the agent to learn from 2 THE MOTION PLANNING mistakes without suffering catastrophic damage. PROBLEM It is also noteworthy that in light of the fragility of robotic hardware, implementing exploratory behavior For an arm (or a humanoid upper body) working in in simulation is an appealing alternative to the real 3D space, the motion planning problem is formalized world. In fact this is the approach favored in most as follows: The workspace, of the path planning literature, however it requires 3 the restrictive assumption that the configuration of the W = R (1) workspace/environment does not change between the contains a robot composed of n links: simulated synthesis and validation of the behavior and the execution of the behavior by hardware in the real n [ world. In practice, the static environment assumption A(q) = Ai(q) ⊂ W (2) may not hold, and therefore if motions/behaviors that i=1 are planned offline are to be used in any but the most Each link, Ai, is represented by a semi-algebraic controlled environment, an online monitoring mecha- model. The vector of joint positions nism is still required to prevent harmful collisions. n The remainder of this paper proceeds as follows: q 2 C = R (3) In section 2, the motion planning problem is defined. denotes the configuration of the robot A, and kine- Prevalent approaches to motion planning and reactive matic constraints yield a proper functional mapping collision avoidance are discussed in sections 3 and 4, q ! A(q). Furthermore, there exists an obstacle re- respectively. Therein, we observe that currently avail- gion composed of m obstacles: able planning and collision avoidance methods offer either the ability to build up knowledge of feasible m [ actions through exploration, or the ability to react to O = Oi ⊂ W (4) a changing environment, but not both. In section 5 i=1 we describe our assumptions concerning the motion Each obstacle, Oi, is also expressed as a semi- planning problem for non-static environments, and algebraic model. we propose a very simple way to integrate roadmap To find feasible motions, we must disambiguate planning with reactive collision response. In contrast the feasible region of the configuration space, Cf ree ⊂ to available methods, our reflexive collision response C, from the infeasible region: approach offers the considerable benefits that: Cin f easible = CO [Csel f ⊂ C (5) 1. Roadmap planning is extended to non-static envi- CO denotes the set of configurations q for which A(q) ronments by transforming the roadmap graph into intersects O: a Markov Decision Process (MDP). C = fq 2 C j A(q) \ O 6= g ⊂ C (6) 2. Adaptation of the map to changes in the environ- O ? ment is phrased as a reinforcement learning (RL) In equation 5 Csel f denotes the set of configurations q problem. for which A(q) is self-intersecting. To handle this, let P denote a set of pairs of indices ( j;k) 2 P such that 3. Topological changes to the state-action map can j 6= k and j;k 2 f1;2;:::ng, which correspond to the be handled within the MDP framework. poses of two links A j and Ak that are not allowed to collide. As a matter of convenience, indices of links 4. Provided that unexpected contact with obstacles that share a common joint are typically excluded from can be detected physically, the response behavior P. The set of self colliding poses can then be ex- need not be generated by a computational model pressed: of the robot. [ C = fq 2 C j A (q) \ A (q) 6= g ⊂ C (7) In section 6, we discuss implementation issues sel f j k ? [ j;k]2P and in section 7, we conduct a brief feasibility study on the implemented software, which was developed The motion planning problem is essentially to find in collaboration with the EU project IM-CLeVeR. Fi- a trajectory T(qinitial;qgoal) ⊂ Cf ree that interpolates nally, in section 8 we discuss outstanding issues and initial and goal configurations while not intersecting future work . Cin f easible. 3 PLANNING ALGORITHMS primary motivation for multiple query planning algorithms, which typically utilize a “roadmap” data G(V;E) V = One approach to solving the above problem is to structure in the form of a graph , where fv ;v ;:::v g 2 C E plan ahead of time. Algorithms that take this ap- 1 2 n f ree and is a set of pairs of indices ( j;k) 2 V j 6= k j;k 2 f ; ;:::ng proach are generally called “Path Planning” or “Mo- such that and 1 2 . With E tion Planning” algorithms, as they plan and validate each member of is associated a verified collision free trajectory T(Ei) ⊂ Cf ree. The roadmap approach feasible motions, which can later be passed to the n robot as reference trajectories. There exists a vast lit- reduces each query from a search in R to graph erature on this topic, and we refer the interested reader search, which can be carried out by Dijkstra’s shortest O(jVj · log(jVj)) to the excellent, recent text book by Stephen LaValle path algorithm in time. Importantly, (LaValle, 2006). Here we focus on the only class of the roadmap graph represents a natural crossroads be- motion planning algorithms that scale to the high di- tween motion planning as an engineering discipline mensional configuration spaces of humanoid robots. and the field of artificial intelligence, as it is a spe- cial case of an MDP, where “states” are configurations Sampling based motion planning algorithms q 2 C , “actions” are trajectories q(t) ⊂ C , and probe the configuration space with some sampling f ree f ree all state transition probabilities are equal to one.

Load more