Subset Selection of Search Heuristics

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Subset Selection of Search Heuristics Chris Rayner Nathan Sturtevant Michael Bowling University of Alberta University of Denver University of Alberta Edmonton, AB, Canada T6G 2E8 Denver, CO, USA 80208 Edmonton, AB, Canada T6G 2E8 [email protected] [email protected] [email protected] Abstract optimality if the heuristics are consistent. An empirical evalu- ation of our approach shows it to be capable of outperforming Constructing a strong heuristic function is a central existing methods. We further use our approach to accurately problem in heuristic search. A common approach is benchmark a promising new type of true-distance heuristic, to combine a number of heuristics by maximizing which gives valuable insight into the problem of constructing over the values from each. If a limit is placed on heuristics for highly directed search graphs. this number, then a subset selection problem arises. We treat this as an optimization problem, and pro- 2 Background ceed by translating a natural loss function into a submodular and monotonic utility function under Research spanning the past two decades shows a progression which greedy selection is guaranteed to be near- toward building heuristics automatically, often in ways that optimal. We then extend this approach with a sam- enforce the key properties of admissibility and consistency. pling scheme that retains provable optimality. Our When heuristics are so readily constructed, an accompanying empirical results show large improvements over ex- subset selection problem is inevitable. isting methods, and give new insight into building One common motivation for heuristic construction is heuristics for directed domains. to solve otherwise intractable search problems. Here, pattern databases (PDBs) have arguably made the greatest im- pact [Culberson and Schaeffer, 1998]. The problem of select- 1 Introduction ing heuristic subsets is prevalent here, with it having been ob- served that many small PDBs can be more effective together The heuristic function – which we also simply call the heuris- than a single monolithic PDB [Holte et al., 2006]. tic – is an estimate of the cost or distance between two states. In other cases, the heuristic function is constructed to ex- When used in search, a strong heuristic can improve solution pedite the solution of arbitrarily many future problems. These quality, speed, and can render intractable problems tractable. cases are a motivating force behind our work, with exam- The literature describes several approaches to heuristic ple applications including GPS navigation and video game construction, including pattern databases [Culberson and pathfinding. Several algorithms have been proposed in this Schaeffer, 1998], memory-based heuristics [Sturtevant et al., context for selecting heuristic subsets, but most lack an opti- 2009], regressors [Ernandes and Gori, 2004], and metric em- mality criterion or are tied to a specific type of heuristic func- beddings [Rayner et al., 2011], each capable of generating tion. For example, one recent approach draws a connection multiple different heuristic functions based on input parame- between heuristic construction and manifold learning [Wein- ters. When multiple heuristics are available, it is common to berger et al., 2005] and represents heuristic information as query each and somehow combine the resulting values into a distances between points in Euclidean space [Rayner et al., better estimate. However, under constraints on lookup time or 2011]. Principal components analysis can be used to select memory, it may be that only a subset of a much larger pool an optimal, variance-preserving subset of this distance infor- of candidate heuristics can be used during search. Groups of mation, but that approach is exclusive to Euclidean heuristics. heuristics can interact with each other in subtle ways, which Another popular approach is to precompute true distances makes selecting the best subset among them challenging. to a landmark [Goldberg and Werneck, 2003], which can In this paper, we formulate selecting heuristic subsets as an be thought of as selecting a subset of all distance informa- optimization problem. The loss function we aim to minimize tion [Sturtevant et al., 2009]. These methods can be viewed is inspired by search effort, and is independent of the type of as special cases of Lipschitz embeddings [Bourgain, 1985] heuristics in the candidate set. When the candidate heuristics in which distances are computed to the nearest of a set of are admissible, this loss can be translated into a submodular landmarks. Many selection algorithms have been devised and monotonic utility function; these properties imply that for these, both as heuristics [Goldberg and Harrelson, 2005; greedy selection is near-optimal. We also introduce a sample Fuchs, 2010] and as metric embeddings [Linial et al., 1995], utility function under which greedy selection retains provable but these too cannot be applied to other types of heuristics. 637 3 Subset Selection of Search Heuristics All told, we arrive at a specific utility maximization problem: We consider the problem of choosing a good subset H of a maximize U(H) (9) C set of candidate heuristics C = fh1; : : : ; hjCjg. We assume H22 the heuristics in H are to be combined with a set of default subject to jHj = d heuristics D by maximizing over the values across both H and D. For states i and j, we denote this heuristic lookup as: Unfortunately, there is unlikely to be an efficient algorithm to find a globally optimal solution to this problem. H h (i; j) = max hx(i; j) (1) 2 hx2H[D Proposition 1 The optimization problem (9) is NP-hard. In the simplest case, D contains only the zero heuristic, which Proof. We sketch a reduction from the NP-complete Vertex gives 0 for any pair of states queried. We further assume any Cover problem over an undirected graph (V; E). This is the default or candidate heuristic hx 2 D[C is non-negative and problem of finding a subset of d graph vertices T ⊆ V such admissible (i.e., never overestimating), that all edges in E are incident to at least one vertex in T . By definition, heuristics describe values between pairs of 8i; j; 0 ≤ hx(i; j) ≤ δ(i; j); (2) vertices in a search graph. A special case of this is a function where δ(i; j) is the true distance between states i and j. that only returns 1 between a vertex and its neighbors, and 0 for any other query. Thus, to reduce vertex cover, we define C 3.1 Optimization Problem as a set of heuristics C = fhv : v 2 V g where each hv 2 C We formalize the heuristic subset selection problem as an op- gives a value of 1 between vertex v and its neighbors, and 0 timization problem: otherwise. If a subset of d such heuristics captures all edge costs, then there is a vertex cover of size d as well. minimize L(H) (3) H22C subject to jHj = d 3.2 Approximation Algorithm Despite Proposition 1, greedy selection will yield a solution The constraint jHj = d simply limits the capacity of H to a to the optimization problem (9) with a strong near-optimality fixed-size subset of C, and the loss L(H) is a scalar quantity guarantee. This is because U is submodular and monotonic. summarizing the quality of a given subset H. Submodularity is a diminishing returns property. It aptly We relate loss to eventual search effort. An optimal search describes settings where marginal gains in utility start to di- algorithm must expand any state encountered whose heuristic minish due to saturation of the objective, such as with sensor is low enough to suggest it may be on an optimal path to the placement and monetary gain. Let A ⊆ B S, let x 2 SnB, goal [Bagchi and Mahanti, 1983], so a well-suited loss is the ( and let φ be a function over 2S. φ is submodular if: weighted sum of the errors between the resulting heuristic values and the true distances, for all pairs across n states: φ(A [ fxg) − φ(A) ≥ φ(B [ fxg) − φ(B) (10) Pn Pn H L(H) = i=1 j=1 Wij δ(i; j) − h (i; j) (4) That is, the same element newly added to a subset and its su- n×n perset will lead the subset to gain at least as much in value as The non-negative weight matrix W 2 R is a free param- the superset. Intuitively, the utility function U(H) and sub- eter, and it can be flexibly defined to specify the relative im- modularity are a good fit, since adding a new heuristic to H portance of each pair of states, perhaps based on knowledge will not newly cover any of the heuristic values that H already of frequent start and goal locations. covers. We prove this formally in the following lemma. We can rewrite this loss as an equivalent utility function U. First, all of the heuristics in H [ D are admissible, so for all Lemma 1 U is submodular. states i and j, hH (i; j) ≤ δ(i; j). Therefore we can remove Proof. Let A and B be sets of heuristics with A ⊆ B, and 1 the absolute value from line 4 and split the sum: let hc 2 C be a particular but arbitrary candidate heuristic function which is in neither A nor B (i.e., h 2 C n B). We L(H) = P W (i; j) − hH (i; j) (5) c i;j ij δ can reproduce the inequality on line 10 as follows: P P H = i;j Wij δ(i; j) − i;j Wij h (i; j) (6) U(A [ fhcg) − U(A) (11) H The leftmost term on line 6 does not depend on , so min- = P W hA[fhcg(i; j) − P W hA(i; j) (12) imizing L(H) is equivalent to maximizing the term on the i;j ij i;j ij P A[fhcg A right.

Subset Selection of Search Heuristics

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support