Fast Path Planning Using Experience Learning from Obstacle Patterns
Total Page:16
File Type:pdf, Size:1020Kb
Fast Path Planning Using Experience Learning from Obstacle Patterns Olimpiya Saha and Prithviraj Dasgupta Computer Science Department University of Nebraska at Omaha Omaha, NE 68182, USA fosaha,[email protected] Abstract where the location and geometry of obstacles are initially unknown or known only coarsely. To navigate in such an en- We consider the problem of robot path planning in an vironment, the robot has to find a collision-free path in real- environment where the location and geometry of obsta- time, by determining and dynamically updating a set of way- cles are initially unknown while reusing relevant knowl- edge about collision avoidance learned from robots’ points that connect the robot’s initial position to its goal po- previous navigational experience. Our main hypothesis sition. While there are several state-of-the-art path planners in this paper is that the path planning times for a robot available for robot path planning (Choset et al. 2005), these can be reduced if it can refer to previous maneuvers planners usually replan the path to the goal from scratch ev- it used to avoid collisions with obstacles during earlier ery time the robot encounters an obstacle that obstructs its missions, and adapt that information to avoid obstacles path to the goal - an operation that can consume consid- during its current navigation. To verify this hypothesis, erable time (order of minutes or even hours), if the envi- we propose an algorithm called LearnerRRT that first ronment is complex, with many obstacles. Excessively ex- uses a feature matching algorithm called Sample Con- pended path planning time also reduces the robot’s energy sensus Initial Alignment (SAC-IA) to efficiently match (battery) to perform its operations, and aggravates the over- currently encountered obstacle features with past obsta- cle features, and, then uses an experience based learn- all performance of the robot’s mission. ing technique to adapt previously recorded robot obsta- To address this problem, in this paper, we make the insight cle avoidance trajectories corresponding to the matched that, although obstacles could be geometrically dissimilar in feature, to the current scenario. The feature matching nature, yet there exists some generic features that are com- and machine learning techniques are integrated into the mon across most obstacles. If a robot can be trained to navi- robot’s path planner so that the robot can rapidly and seamlessly update its path to circumvent an obstacle it gate around obstacles with some basic geometric patterns, encounters, in real-time, and continue to move towards it can adapt and synthesize these movements to navigate its goal. We have conducted several experiments using around more complex obstacles without having to learn the a simulated Coroware Corobot robot within the Webots motion around those obstacles from scratch. In this manner, simulator to verify the performance of our proposed al- navigation can be learned incrementally by a robot, without gorithm, with different start and goal locations, and dif- having to be trained independently for each new environ- ferent obstacle geometries and placements, as well as ment it is placed in. To realize our proposed approach we compared our approach to a state-of-the-art sampling- describe an algorithm called LearnerRRT, where, we first based path planner. Our results show that the proposed train a robot by navigating it in an environment where obsta- algorithm LearnerRRT performs much better than In- cle shapes have some common geometric patterns. The fea- formed RRT*. When given the same time, our algorithm finished its task successfully whereas Informed RRT* tures of the obstacles perceived by the robot’s sensors and could only achieve 10 − 20 percent of the optimal dis- its movements or actions to avoid the obstacles are stored tance. in summarized format within a repository maintained in- side the robot. Later on, when the robot is presented with a navigation task in a new environment, which requires it to Introduction maneuver around obstacles, it retrieves the obstacle feature- Autonomous navigation is one of the fundamental problems action pair, where the retrieved obstacle’s features have the in robotics used in several real life applications such as highest resemblance to its currently perceived obstacle fea- unmanned search and rescue, autonomous exploration and tures. The robot then mimics the actions retrieved from its surveillance, and domestic applications such as automated previous maneuver after adapting them for the current ob- waste cleanup or vaccum cleaning. We consider the navi- stacle. We have tested our algorithm on a simulated Corobot gation problem for a robot in an unstructured environment robot using the Webots simulator within different environ- ments having different obstacle geometries and spatial dis- Copyright c 2016, Association for the Advancement of Artificial tributions while varying the start and goal locations of the Intelligence (www.aaai.org). All rights reserved. navigation task given to the robot. Our results show that our proposed algorithm LearnerRRT can perform path plan- possible. The technique reduces to planning from scratch ning in real time in a more time effective manner compared when no past experiences can be reused. In the Lightning to sampling based techniques like Informed RRT*. When and Thunder frameworks (Berenson, Abbeel, and Goldberg given the same time, our algorithm was able to reach the goal 2012; Coleman et al. 2014), a robot learns paths from past in the different environments discussed whereas Informed experiences in a high-dimensional space and reuses them ef- RRT* had covered approximately about 10 − 20 percent of fectively in the future to solve a given navigation task. In the optimal distance between the start and goal locations. most of these approaches, the metric for calculating simi- larity to previous navigation tasks is based on similarity be- Related Work tween the robot’s start and goal locations between a previ- Applications of machine learning to robot navigation has ous task and the navigation task at hand. Similar to these been a topic of interest in the robotics community over the approaches, our LearnerRRT algorithm exploits past knowl- recent years. In one of the earliest works in this direction, edge if a considerable amount of relevance between the cur- Fernandez and Veloso (Fernandez´ and Veloso 2006) pro- rent obstacle and a previously encountered obstacle is iden- posed a policy reuse technique where an agent calculates its tified. Otherwise, to avoid negative transfer, it reverts to expected rewards from possible actions within a reinforce- planning from scratch using a state-of-the-art motion plan- ment learning framework to select whether to use the action ner called Informed RRT* (Gammell, Srinivasa, and Bar- prescribed by its learned policies to explore new actions. da foot 2014) to navigate in the target environment. However, Silva and Mackworth (Da Silva and Mackworth 2010) ex- in contrast to them, our approach considers a higher level tended this work by adding spatial hints in the form of ex- of granularity by learning and reusing actions at obstacles pert advice about the world states. Recently, transfer learn- instead of actions between start and goal locations of pre- ing has been proposed to leverage knowledge gained in one vious navigation paths, to make the navigation apply across task to a related but different task (Taylor and Stone 2009) environments that can vary in size and obstacle size and lo- by transferring appropriate knowledge from related source cations. tasks. Most transfer learning techniques rely on a source task selection strategy, where the most suitable source task Problem Formulation is selected from the source task pool and applied to the tar- Consider a wheeled robot situated within a bounded envi- get task in order to solve it. A related problem with such ronment Q ⊆ <2. We use q 2 Q to denote a configura- an approach is that if the source task is selected incorrectly, tion of the robot, Q ⊂ Q to denote the free space, and the target task can be peformed poorly owing to irrelevant free Q = Q − Q to denote the space occupied by obsta- or ‘negative’ transfer of knowledge from the source task. obs free cles in the environment respectively. The action set for the In (Taylor, Kuhlmann, and Stone 2007), the authors have ad- robot’s motion is given by A ⊂ f[−π; π] × <g; an action is dressed this problem using a transfer hierarchy. Approaches denoted as a = (θ; d) 2 A, where θ 2 [−π; π] and d 2 < such as a human-provided mapping (Torrey et al. 2006) and are the angle (in radians) the robot needs to turn and the dis- a statistical relational model (Torrey and Shavlik 2010) to tance it needs to move respectively. Performing an action a assess similarities between a source and a target task have in configuration q takes the robot to a new configuration q0, also been proposed to mitigate negative transfer. Other tech- which is denoted mathematically as a(q) = q0. A path is niques to learn source to target task mappings efficiently in- an ordered sequence of actions, P = (a ; a ; a ; :::). Let T clude an experience-based learning framework called MAS- 1 2 3 denote a navigation task for the robot, T = (q ; q ), TER (Taylor, Kuhlmann, and Stone 2008) and an experts start goal where q ; q 2 Q denote the start and goal loca- algorithm which is used to select a candidate policy for solv- start goal free tions of T respectively. The objective of the robot is to find ing an unknown Markov Decision Process task (Talvitie and a sequence of actions that guarantees a collision free navi- Singh 2007). Our paper is along this direction of work, and gation path connecting q and q .