On Dead-End Detectors, the Traps They Set, and Trap Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Search and Learn: On Dead-End Detectors, the Traps they Set, and Trap Learning Marcel Steinmetz and Jorg¨ Hoffmann Saarland University, Saarland Informatics Campus, Saarbrucken,¨ Germany fsteinmetz,[email protected] Abstract of techniques participated in the inaugural Unsolvability In- ternational Planning Competition (UIPC’16) (e. g., [Torralba A key technique for proving unsolvability in classi- and Alcazar,´ 2013; Domshlak et al., 2015; Torralba et al., cal planning are dead-end detectors ∆: effectively 2016; Pommerening and Seipp, 2016; Seipp et al., 2016; testable criteria sufficient for unsolvability, prun- Steinmetz and Hoffmann, 2016a; Torralba, 2016; Gnad et al., ing (some) unsolvable states during search. Re- 2016]). One major strand of these works designs what we will lated to this, a recent proposal is the identification refer to as dead-end detectors ∆: effectively testable crite- of traps prior to search, compact representations ria sufficient for unsolvability, designed to be called on every of non-goal state sets T that cannot be escaped. state during search, serving to prune those dead-end states Here, we create new synergy across these ideas. detected. Such ∆ were designed based on suitable variants We define a generalized concept of traps, relative of heuristic functions, namely pattern databases [Edelkamp, to a given dead-end detector ∆, where T can be 2001], merge-and-shrink heuristics [Helmert et al., 2014; escaped, but only into dead-end states detected by Hoffmann et al., 2014], potential heuristics [Pommerening et ∆. We show how to learn compact representations al., 2015], and critical-path heuristics [Haslum and Geffner, of such T during search, extending the reach of ∆. 2000; Steinmetz and Hoffmann, 2016b; 2017]. These detect Our experiments show that this can be quite ben- a state s to be a dead-end if s is unsolvable in the approxima- eficial. It improves coverage for many unsolvable tion underlying the heuristic function. benchmark planning domains and dead-end detec- A recent related proposal is the identification of traps tors ∆, in particular on resource-constrained do- [Lipovetzky et al., 2016]: compact representations of non- mains where it outperforms the state of the art. goal state sets T that cannot be escaped, i. e., where from any state s 2 T , all states s0 reachable from s are also contained 1 Introduction in T . Such traps can be identified through an offline analysis, Classical planning is concerned with the analysis of goal prior to search. Here we extend the trap idea in two ways: reachability in large state spaces, compactly described in (i) We observe that traps can be combined for synergistic terms of planning tasks specifying a vector of state vari- effect with arbitrary dead-end detectors ∆. ables, an initial state, a set of actions, and a goal condition. (ii) We observe that traps can be learned online during Planning research has traditionally been concerned with solv- search, from the dead-end states encountered. able tasks, reflected for example in the benchmarks used in By (i), the trap Θ extends the reach of ∆, avoiding “the traps International Planning Competition (IPC) the up to the year set for the search by ∆”. By (ii), this is done dynamically [ 2014 Bacchus, 2001; Long and Fox, 2003; Hoffmann and from information that becomes available during search. et al. et al. ] Edelkamp, 2005; Gerevini , 2009; Coles , 2012 . Notably, our technique can also be run without any other However, proving planning tasks unsolvable is also quite rele- dead-end detector ∆ (technically: a trivial ∆ not detecting vant in practice. Unsolvable tasks occur, for example, in over- any dead-ends). In this case, (i) is mute, and (ii) turns our [ et al. subscription planning Smith, 2004; Gerevini , 2009; technique into an online-learning variant of the original traps ] Domshlak and Mirkis, 2015 and in directed model check- proposal [Lipovetzky et al., 2016]. [ et al. et al. ing Edelkamp , 2004; Kupferschmid , 2006; In the ability to learn sound and generalizable knowledge ] 2008 . Furthermore, even solvable planning tasks often – “nogoods” – about dead-ends during search, our work is ri- dead-end contain unsolvable – – states, for example when valed only by recent methods for the online refinement of a [ dealing with limited resources Laborie and Ghallab, 1995; critical-path heuristic dead-end detector ∆C [Steinmetz and et al. et al. ] Nakhost , 2012; Coles , 2013 . Hoffmann, 2016b; 2017].1 In the ability to exploit synergy Research in classical planning has recently seen a surge with another dead-end detector ∆, our technique is unique of techniques addressing these issues, designing effective techniques for proving unsolvability. After initial works 1Most works on nogood learning in state space search assume [Backstr¨ om¨ et al., 2013; Hoffmann et al., 2014], a wealth a plan length bound [Blum and Furst, 1997; Long and Fox, 1999; 4398 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) in the following sense: if s is a state all of whose successor identify a compact representation Θ of such a T offline, prior states s0 are detected to be dead-ends by ∆, then we can learn to search, and to use Θ to detect and prune dead-ends dur- to detect s without having to detect also the states s0. This is ing search. This compact representation is determined from in contrast to all other known dead-end detectors: when learn- partial states, partial variable assignments t of size up to k, ing to detect s, these necessarily – and redundantly with the where k is a parameter. The states induced by such Θ are given ∆ – also learn to detect all s0. The latter is because given by T Θ := fs 2 S j 9t 2 Θ: t ⊆ sg. Verifying all known dead-end detectors are transitive, i. e., when they whether T Θ is a trap can be done equivalently on Θ through detect a state s, they also detect all states reachable from s. progression over partial states. Say that a is applicable to a Transitivity is a natural property, as, after all, dead-end detec- partial state t if preajV(t) ⊆ t, and if applicable, define the tors need to reason about all possible descendant states; for progression of t over a as the partial variable assignment t a dead-end detectors based on a heuristic function, transitivity where t a (v) := J K follows from consistency. Steinmetz and Hoffmann [2017] J K explore combinations of ∆C -learning with other dead-end 8eff (v) if eff (v) 6= ?; > a a δ > detectors , yet find that these suffer from having to learn <t(v) if eff a(v) = ? and t(v) 6= ?; to subsume δ. Our notion of ∆-traps does not have that issue, pre (v) if eff (v) = t(v) = ? and pre (v) 6= ?; and is empirically synergistic with several δ. > a a a :? otherwise We implemented our techniques in combination with es- sentially all known dead-end detectors, in particular those In words, t is extended by pre and the resulting (partial) run in UIPC’16. We also enhanced the UIPC’16 winner, a variable assignment is overwritten by eff a. By definition, the Aidos portfolio [Seipp et al., 2016], in this manner. Our t a ⊆ s a for the application of a in any state s where a experiments show that online ∆-trap learning can be quite isJ applicableK J K and t ⊆ s. It is easy to show that T Θ constitutes beneficial. It is competitive on its own, run without any a trap if and only if (a) every t 2 Θ disagrees with the goal other dead-end detector. Combined with a variety of previous on some v, and (b) Θ is closed under progression, i. e., for all dead-end detectors ∆, it improves coverage for many unsolv- t 2 Θ and for all actions a applicable to t, there is t0 2 Θ so able benchmark planning domains, in particular on resource- that t0 ⊆ t a . constrained domains where it outperforms the state of the art. J K 3 Dead-End Detectors and the Traps they Set 2 Background We show that trap identification can be combined with arbi- We use the finite-domain representation (FDR) framework. trary dead-end detectors. To this end, we consider a generic A planning task is a tuple Π = hV; A; I; Gi. V is a set of notion of dead-end detectors, and we introduce an accord- state variables, each v 2 V associated with a finite domain ingly modified notion of traps. D(v) A actions a hpre ; eff i . is a set of , each a pair a a of A dead-end detector is a function ∆ : S 7! f0; 1g V initial state I partial assignments to . The is a complete where ∆(s) = 1 only if s is a dead-end state. Like V goal V state assignment to , the is a partial assignment to .A for heuristic functions, the intention is to call ∆ on every s V a applicable is a complete assignment to . An action is in state during search, so ∆ will typically be effectively com- s pre ⊆ s a s a if a , and applying such results in the state putable. As a baseline, we will use the na¨ıve dead-end s eff eff plan J sK overwriting with a where a is defined. A for detector, denoted ∆0, which returns 0 for all states (i. e., π is an action sequence whose iterative application leads to does not recognize any dead-end). More elaborate known s G ⊆ s s dead-end π G where G; is a if no such exists.

On Dead-End Detectors, the Traps They Set, and Trap Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support