On Dead-End Detectors, the Traps They Set, and Trap Learning
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Search and Learn: On Dead-End Detectors, the Traps they Set, and Trap Learning Marcel Steinmetz and Jorg¨ Hoffmann Saarland University, Saarland Informatics Campus, Saarbrucken,¨ Germany fsteinmetz,[email protected] Abstract of techniques participated in the inaugural Unsolvability In- ternational Planning Competition (UIPC’16) (e. g., [Torralba A key technique for proving unsolvability in classi- and Alcazar,´ 2013; Domshlak et al., 2015; Torralba et al., cal planning are dead-end detectors ∆: effectively 2016; Pommerening and Seipp, 2016; Seipp et al., 2016; testable criteria sufficient for unsolvability, prun- Steinmetz and Hoffmann, 2016a; Torralba, 2016; Gnad et al., ing (some) unsolvable states during search. Re- 2016]). One major strand of these works designs what we will lated to this, a recent proposal is the identification refer to as dead-end detectors ∆: effectively testable crite- of traps prior to search, compact representations ria sufficient for unsolvability, designed to be called on every of non-goal state sets T that cannot be escaped. state during search, serving to prune those dead-end states Here, we create new synergy across these ideas. detected. Such ∆ were designed based on suitable variants We define a generalized concept of traps, relative of heuristic functions, namely pattern databases [Edelkamp, to a given dead-end detector ∆, where T can be 2001], merge-and-shrink heuristics [Helmert et al., 2014; escaped, but only into dead-end states detected by Hoffmann et al., 2014], potential heuristics [Pommerening et ∆. We show how to learn compact representations al., 2015], and critical-path heuristics [Haslum and Geffner, of such T during search, extending the reach of ∆. 2000; Steinmetz and Hoffmann, 2016b; 2017]. These detect Our experiments show that this can be quite ben- a state s to be a dead-end if s is unsolvable in the approxima- eficial. It improves coverage for many unsolvable tion underlying the heuristic function. benchmark planning domains and dead-end detec- A recent related proposal is the identification of traps tors ∆, in particular on resource-constrained do- [Lipovetzky et al., 2016]: compact representations of non- mains where it outperforms the state of the art. goal state sets T that cannot be escaped, i. e., where from any state s 2 T , all states s0 reachable from s are also contained 1 Introduction in T . Such traps can be identified through an offline analysis, Classical planning is concerned with the analysis of goal prior to search. Here we extend the trap idea in two ways: reachability in large state spaces, compactly described in (i) We observe that traps can be combined for synergistic terms of planning tasks specifying a vector of state vari- effect with arbitrary dead-end detectors ∆. ables, an initial state, a set of actions, and a goal condition. (ii) We observe that traps can be learned online during Planning research has traditionally been concerned with solv- search, from the dead-end states encountered. able tasks, reflected for example in the benchmarks used in By (i), the trap Θ extends the reach of ∆, avoiding “the traps International Planning Competition (IPC) the up to the year set for the search by ∆”. By (ii), this is done dynamically [ 2014 Bacchus, 2001; Long and Fox, 2003; Hoffmann and from information that becomes available during search. et al. et al. ] Edelkamp, 2005; Gerevini , 2009; Coles , 2012 . Notably, our technique can also be run without any other However, proving planning tasks unsolvable is also quite rele- dead-end detector ∆ (technically: a trivial ∆ not detecting vant in practice. Unsolvable tasks occur, for example, in over- any dead-ends). In this case, (i) is mute, and (ii) turns our [ et al. subscription planning Smith, 2004; Gerevini , 2009; technique into an online-learning variant of the original traps ] Domshlak and Mirkis, 2015 and in directed model check- proposal [Lipovetzky et al., 2016]. [ et al. et al. ing Edelkamp , 2004; Kupferschmid , 2006; In the ability to learn sound and generalizable knowledge ] 2008 . Furthermore, even solvable planning tasks often – “nogoods” – about dead-ends during search, our work is ri- dead-end contain unsolvable – – states, for example when valed only by recent methods for the online refinement of a [ dealing with limited resources Laborie and Ghallab, 1995; critical-path heuristic dead-end detector ∆C [Steinmetz and et al. et al. ] Nakhost , 2012; Coles , 2013 . Hoffmann, 2016b; 2017].1 In the ability to exploit synergy Research in classical planning has recently seen a surge with another dead-end detector ∆, our technique is unique of techniques addressing these issues, designing effective techniques for proving unsolvability. After initial works 1Most works on nogood learning in state space search assume [Backstr¨ om¨ et al., 2013; Hoffmann et al., 2014], a wealth a plan length bound [Blum and Furst, 1997; Long and Fox, 1999; 4398 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) in the following sense: if s is a state all of whose successor identify a compact representation Θ of such a T offline, prior states s0 are detected to be dead-ends by ∆, then we can learn to search, and to use Θ to detect and prune dead-ends dur- to detect s without having to detect also the states s0. This is ing search. This compact representation is determined from in contrast to all other known dead-end detectors: when learn- partial states, partial variable assignments t of size up to k, ing to detect s, these necessarily – and redundantly with the where k is a parameter. The states induced by such Θ are given ∆ – also learn to detect all s0. The latter is because given by T Θ := fs 2 S j 9t 2 Θ: t ⊆ sg. Verifying all known dead-end detectors are transitive, i. e., when they whether T Θ is a trap can be done equivalently on Θ through detect a state s, they also detect all states reachable from s. progression over partial states. Say that a is applicable to a Transitivity is a natural property, as, after all, dead-end detec- partial state t if preajV(t) ⊆ t, and if applicable, define the tors need to reason about all possible descendant states; for progression of t over a as the partial variable assignment t a dead-end detectors based on a heuristic function, transitivity where t a (v) := J K follows from consistency. Steinmetz and Hoffmann [2017] J K explore combinations of ∆C -learning with other dead-end 8eff (v) if eff (v) 6= ?; > a a δ > detectors , yet find that these suffer from having to learn <t(v) if eff a(v) = ? and t(v) 6= ?; to subsume δ. Our notion of ∆-traps does not have that issue, pre (v) if eff (v) = t(v) = ? and pre (v) 6= ?; and is empirically synergistic with several δ. > a a a :? otherwise We implemented our techniques in combination with es- sentially all known dead-end detectors, in particular those In words, t is extended by pre and the resulting (partial) run in UIPC’16. We also enhanced the UIPC’16 winner, a variable assignment is overwritten by eff a. By definition, the Aidos portfolio [Seipp et al., 2016], in this manner. Our t a ⊆ s a for the application of a in any state s where a experiments show that online ∆-trap learning can be quite isJ applicableK J K and t ⊆ s. It is easy to show that T Θ constitutes beneficial. It is competitive on its own, run without any a trap if and only if (a) every t 2 Θ disagrees with the goal other dead-end detector. Combined with a variety of previous on some v, and (b) Θ is closed under progression, i. e., for all dead-end detectors ∆, it improves coverage for many unsolv- t 2 Θ and for all actions a applicable to t, there is t0 2 Θ so able benchmark planning domains, in particular on resource- that t0 ⊆ t a . constrained domains where it outperforms the state of the art. J K 3 Dead-End Detectors and the Traps they Set 2 Background We show that trap identification can be combined with arbi- We use the finite-domain representation (FDR) framework. trary dead-end detectors. To this end, we consider a generic A planning task is a tuple Π = hV; A; I; Gi. V is a set of notion of dead-end detectors, and we introduce an accord- state variables, each v 2 V associated with a finite domain ingly modified notion of traps. D(v) A actions a hpre ; eff i . is a set of , each a pair a a of A dead-end detector is a function ∆ : S 7! f0; 1g V initial state I partial assignments to . The is a complete where ∆(s) = 1 only if s is a dead-end state. Like V goal V state assignment to , the is a partial assignment to .A for heuristic functions, the intention is to call ∆ on every s V a applicable is a complete assignment to . An action is in state during search, so ∆ will typically be effectively com- s pre ⊆ s a s a if a , and applying such results in the state putable. As a baseline, we will use the na¨ıve dead-end s eff eff plan J sK overwriting with a where a is defined. A for detector, denoted ∆0, which returns 0 for all states (i. e., π is an action sequence whose iterative application leads to does not recognize any dead-end). More elaborate known s G ⊆ s s dead-end π G where G; is a if no such exists.