Solving Robot Navigation Problems with Initial Pose Uncertainty Using Real-Time Heuristic Search
Total Page:16
File Type:pdf, Size:1020Kb
From: AIPS 1998 Proceedings. Copyright © 1998, AAAI (www.aaai.org). All rights reserved. Solving Robot Navigation Problems with Initial Pose Uncertainty Using Real-Time Heuristic Search Sven Koenig Reid G. Simmons College of Computing Computer Science Department Georgia Institute of Technology Carnegie Mellon University [email protected] [email protected] Abstract Westudy goal-directednavigation tasks in mazes,where the robots knowthe mazebut do not knowtheir initial pose(po- sition and orientation). Thesesearch tasks can be modeled as planningtasks in large non-deterministicdomains whose GoalPose PtLtsiblcSlatting Poses states are sets of poses. Theycan be solvedefficiently by (StartingBelief) interleaving planningand plan execution,which can reduce the sumof planningand plan-executiontime becauseit al- Figure 1: Goal-Directed Navigation Task #1 lows the robots to gather informationearly. Weshow how Min-MaxLRTA*, a real-time heuristic search method,can solve these and other planningtasks in non-deterministic domainsefficiently. It allowsfor fine-grainedcontrol over savings gained. As an example we consider goal-directed howmuch planning to do betweenplan executions, uses navigation tasks in mazes where the robots knowthe maze heuristic knowledgeto guideplanning, and improvesits plan- but do not knowtheir initial pose (position and orienta- executiontime as it solves similar planningtasks, until its tion). Interleaving planning and plan execution allows the plan-executiontime is at least worst-caseoptimal. We also showthat Min-MaxLRTA* solves the goal-directed naviga- robots to makeadditional observations, whichreduces their tion tasks fast, convergesquickly, and requires only a small pose uncertainty and thus the numberof situations that their amountof memory. plans have to cover. This makes subsequent planning more efficient. Agents that interleave planning and plan execution have Introduction to overcometwo problems: first, they have to make sure Situated agents (such as robots) have to take their planning that they make progress towards the goal instead of cy- time into account to solve planning tasks efficiently (Good cling forever; second, they should be able to improvetheir 1971). For single-instance planning tasks, for example, plan-execution time as they solve similar planning tasks, they should attempt to minimize the sum of planning and otherwise they do not behaveefficiently in the long run in plan-execution time. Finding plans that minimizethe plan- case similar planning tasks unexpectedly repeat. Weshow execution time is often intractable. Interleaving planning how Min-MaxLRTA*, a real-time heuristic search method and plan executionis a general principle that can reduce the that extends LRTA*(Korf 1990) to non-deterministic do- planning time and thus also the sumof planning and plan ex- mains, can be used to address these problems. Min-Max ecution time for sufficiently fast agents in non-deterministic LRTA*interleaves planning and plan execution and plans domains (Genesereth & Nourbakhsh1993). Without inter- only in the part of the domainaround the current state of leaving planning and plan execution, the agents have to find the agents. This is the part of the domainthat is immedi- a large conditional plan that solves the planning task. When ately relevant for them in their current situation. It allows interleaving planning and plan execution, on the other hand, for fine-grained control over howmuch planning to do be- the agents have to find only the beginning of such a plan. tween plan executions, uses heuristic knowledgeto guide After the executionof this subplan, the agents repeat the pro- planning, and improvesits plan-execution time as it solves cess fromthe state that actually resulted from the execution similar planningtasks, until its plan-executiontime is at least of the subplaninstead of all states that could have resulted worst-case optimal. Weshow that Min-MaxLRTA* solves from its execution. Since actions are executed before their the goal-directed navigation tasks fast, convergesquickly, complete consequencesare known,the agents are likely to and requires only a small amountof memory. incur someoverhead in terms of the numberof actions ex- Planning methodsthat interleave planning and plan exe- ecuted, but this is often outweighedby the computational cution have been studied before. This includes assumptive Copyright1998, American Association for Artificial Intelli- planning, deliberation scheduling (including anytimealgo- gence(www.aaai.org). All rights reserved. rithms), on-line algorithms and competitiveanalysis, real- Koenig 145 From: AIPS 1998 Proceedings. Copyright © 1998, AAAI (www.aaai.org). All rights reserved. time heuristic search, reinforcementlearning, robot explo- ration techniques, and sensor-based planning. The proceed- ings of the AAAI-97Workshop on On-Line Search give a goodoverview of these techniques. The contributions of this paper are threefold: First, this paper extends our Min-Max Figure 2: Goal-Directed Navigation Task #2 LRTA*(Koenig & Simmons1995), a real-time heuristic search method for non-deterministic domains, from look- ahead one to arbitrary Iook-aheads. It showshow the search igate to any of the given goal poses and stop. Since there space of Min-MaxLRTA* can be represented more com- might be manyposes that produce the same sensor reports pactly than is possible with minimaxtrees and discusses as the goal poses, solving the goal-directed navigation task howit can be solved efficiently using methodsfrom Markov includes localizing the robot sufficiently so that it knows gametheory. Second, this paper illustrates an advantageof that it is at a goal pose whenit stops. Weuse the following real-time heuristic search methodsthat has not been studied notation: P is the finite set of possible robot poses(pairs of before. It wasalready knownthat real-time heuristic search location and orientation). A(p) is the set of possible actions methodsare efficient alternatives to traditional search meth- that the robot can execute in pose p E P: left, right, and ods in deterministic domains. For example, they are among possibly forward, succ(p, a) is the pose that results from the few search methodsthat are able to find suboptimalplans the execution of action a 6. A(p) in posep E P. o(p) is the for the twenty-four puzzle, a sliding-tile puzzle with more observation that the robot makesin pose p E P: whether than 7 x 1024 states (Korf 199_3 ). They have also been used or not there are walls immediatelyadjacent to it in the four to solve STRIPS-typeplanning tasks (Boner, Loerincs. directions (front, left, behind, right). Therobot starts Geffner 1997). The reason whythey are efficient in deter- pose Pstart E P and then repeatedly makes an observation ministic domainsis because they trade-offminimizing plan- and executes an action until it decides to stop. It knowsthe ning time and minimizing plan-execution time. However, maze, but is uncertain about its start pose. It could be in manydomains from robotics, control, and scheduling are any pose in Pstart C_ P. Werequire only that o(p) = o(ff) nondeterministic. Wedemonstrate that real-time heuristic for all p: p’ E Patart, whichautomatically holds after the search methodscan also be used to speed up problemsolving first observation, and pstart E Pstort, whichautomatically in nondeterministic domainsand that there is an additional holds for Pstart = {P : P 6- P A o(p) = o(pstart)}. The reason whythey are efficient in these domains,namely that robot has to navigate to any pose in PooazC_ P and stop. they allow agents to gather informationearly. This informa- Evenif the robot is not certain about its pose, it can tion can be used to resolve someof the uncertainty caused maintain a belief about its current pose. Weassume that by nondeterminism and thus reduce the amount of plan- the robot cannot associate probabilities or other likelihood ning done for unencounteredsituations. Third. this paper estimates with the poses. Then, all it can do is maintain a demonstrates an advantage of Min-MaxLRTA* over most set of possible poses ("belief"). For example,if the robot previous planning methodsthat interleave planning and plan has no knowledgeof its start pose for the goal-directed execution, namelythat it improvesits plan-execution time navigation task from Figure I, but observes walls around it as it solves similar planning tasks. As recognized in both exceptin its front, then the start belief of the robot contains (Dean et al. 1995) and (Stentz 1995), this is an important the seven possible start poses shownin the figure. Weuse property because no planning methodthat executes actions the following notation: B is the set of beliefs, bstart the before it has found a complete plan can guarantee a good start belief, andBgo~,! the set of goal beliefs. A(b) is the set plan-execution time right away, and methods that do not of actions that can be executedwhen the belief is b. O(b, a) improvetheir plan-execution time do not behaveefficiently is the set of possible observationsthat can be madeafter the in the long run in case similar planning tasks unexpectedly execution of action a whenthe belief was b. succ(b, a, o) repeat. is the successor belief that results if observationo is made after the execution of action a whenthe belief wasb. Then, The Robot Navigation Problem Westudy goal-directed navigation tasks with initial pose un- B = {b : b C_P A o(p) = o(p’) for all p,p’ E b} certainty (Nourbakhsh1996), an exampleof which is shown bstart m P.start in Figure I. A robot knowsthe maze, but is uncertain about Bgo~t = {b:bC_Pgo~tA o(p)=o(p’) forall p, p’ Eb} its start pose, wherea pose is a location (square) and orien- A(b) = A(p) for anypEb tation (north, east, south, west). Weassume that there is O(b,a) = {o(succ(p,a)) : pE uncertainty in actuation and sensing, fln our mazes,it is in- = {succ(p,a) : p E b A o(succ(p,a)) deed possible to makeactuation and sensing approximately stw£(b,a,o) 100 percent reliable.) Thesensors on-boardthe robot tell it in every pose whether there are walls immediatelyadjacent To understand the definition of A(b), notice that A(p) to it in the four directions (front, left, behind, right).