Predicting the Size of Depth-First Branch and Bound Search Trees

Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Predicting the Size of Depth-First Branch and Bound Search Trees Levi H. S. Lelis Lars Otten Rina Dechter Computing Science Dept. Dept. of Computer Science Dept. of Computer Science University of Alberta University of California, Irvine University of California, Irvine Edmonton, Canada T6G 2E8 Irvine, USA 92697-3435 Irvine, USA 92697-3435 [email protected] [email protected] [email protected] Abstract load-balancing by partitioning the problem into subproblems of similar EST sizes [Otten and Dechter, 2012b]. This paper provides algorithms for predicting the size of the Expanded Search Tree (EST ) of Depth- Related work. Several methods have been developed for pre- first Branch and Bound algorithms (DFBnB) for dicting the size of the search tree of backtracking and heuris- optimization tasks. The prediction algorithm is im- tic search algorithms such as IDA* [Korf, 1985]. See for plemented and evaluated in the context of solving instance initial work by Knuth [1975], Partial Backtracking combinatorial optimization problems over graphi- by Purdom [1978], Stratified Sampling by Chen [1992], and cal models such as Bayesian and Markov networks. other contributions [Korf et al., 2001; Zahavi et al., 2010; Our methods extend to DFBnB the approaches pro- Burns and Ruml, 2012; Lelis et al., 2013]. These schemes vided by Knuth-Chen schemes that were designed work by sampling a small part of the EST and extrapolating and applied for predicting the EST size of back- from it. The challenge in applying these sampling techniques tracking search algorithms. Our empirical results to DFBnB lies in their implicit assumption of the “stable chil- demonstrate good predictions which are superior to dren” property. Namely, for every node in the EST , the set of competing schemes. EST children can be determined at the time of sampling. In the case of DFBnB, however, the set of children in the EST depends on cb , which impacts the pruning but is generally not 1 Introduction known at prediction time. A frequently used heuristic search algorithm for solving com- Contributions. In this paper we present Two-step Stratified binatorial optimization problems is Depth-First Branch-and- Sampling (TSS), a novel algorithm for predicting the EST Bound (DFBnB) [Balas and Toth, 1985]. DFBnB explores size of DFBnB that extends the work by Knuth [1975] and the search space in a depth-first manner while keeping track Chen [1992]. The algorithm performs multiple “Stratified of the current best-known solution cost, denoted cb. It uses an Sampling runs” followed by a constrained DFBnB execution admissible heuristic function h(·), i.e., a function that never and exploits memory to cope with the stable children issue. overestimates the optimal cost-to-go for every node, and is We show that, if given sufficient time and memory, the pre- guided by an evaluation function f(n) = g(n)+h(n) , where diction produced by TSS converges to the actual EST size. g(n) is the cost of the path from the root node to n. Since We apply our prediction scheme to optimization queries f(n) is an underestimate of the cost of an optimal solution over graphical models, such as finding the most likely expla- that goes through n, whenever f(n) ≥ cb, n is pruned. nation in Bayesian networks [Pearl, 1988] (known as MPE or In practice DFBnB, especially if guided by an effective MAP). In particular, we are interested in predicting the size heuristic, explores only a small fraction of the usually expo- of the search tree expanded by Branch and Bound with mini- nentially large search space, and this varies greatly from one bucket heuristic (BBMB) [Kask and Dechter, 2001], which problem instance to the next. However, predicting the size has been extended into a competition-winning solver [Mari- of this Expanded Search Tree, or EST for short, is hard. It nescu and Dechter, 2009; Otten and Dechter, 2012a]. In ad- depends on intrinsic features of the problem instance that are dition to comparing against pure SS we compare TSS to a not visible a priori (e.g., the number of dead ends that may prediction method presented by Kilby et al. [2006]. Empir- be encountered). The size of the EST may also depend on ical results show that, if memory allows, our prediction is parameters of the algorithm and in particular on the strength effective and overall far superior to earlier schemes. of its guiding heuristic function. Available worst-case com- plexity analysis is blind to these hidden features and often provides uninformative, even useless, upper bounds. 2 Formulation and Background Predicting the EST size could facilitate the choice of a Given a directed, full search tree representing a state-space heuristic on an instance by instance basis. Or, in the context problem [Nilsson, 1980], we are interested in estimating the of parallelizing search, a prediction scheme could facilitate size of a subtree which is expanded by a search algorithm 594 while seeking an optimal solution. We call the former the Algorithm 1 Stratified Sampling, a single probe underlying search tree (UST ) and the latter the Expanded Input: root s∗ of a tree, type system T , and initial upper Search Tree (EST ). bound cb. Output: A sampled tree ST and an array of sets A, where Problem formulation. Let S = (N; E) be a tree represent- A[i] is the set of pairs hs; wi for the nodes s 2 ST ex- ing an EST where N is its set of nodes and for each n 2 N panded at level i. child(n) = fn0j(n; n0) 2 Eg is its set of child nodes. Our 1: initialize A[0] fhs∗; 1ig task is to estimate the size of N without fully generating S. 2: i 0 Definition 1 (General prediction task). Given any numer- 3: while i is less then search depth do ical function z over N, the general task is to approximate a 4: for each element hs; wi in A[i] do function over the EST S = (N; E) of the form 5: for each child s00 of s do X 6: if h(s00) + g(s00) < cb then '(S) = z(s) ; 7: if A[i + 1] contains an element hs0; w0i with s2N T (s0) = T (s00) then 0 0 If z(s) = 1 for all s 2 N, then '(S) is the size of S. 8: w w + w 9: with probability w=w0, replace hs0; w0i in Stratified Sampling. Knuth [1975] showed a method to esti- A[i + 1] by hs00; w0i mate the size of search tree S by repeatedly performing a ran- 10: else dom walk from the start state. Under the assumption that all 11: insert new element hs00; wi in A[i + 1] branches have a structure equal to that of the path visited by 12: i i + 1 the random walk, one branch is enough to predict the structure of the entire tree. Knuth observed that his method was not effective when the EST is unbalanced. Chen [1992] ad- a merge action we increase the weight in the corresponding dressed this problem with a stratification of the EST through representative-weight pair of type t by the weight w of s00. a type system to reduce the variance of the sampling process. s00 will replace s0 according to the probability shown in Line We call Chen’s method Stratified Sampling (SS). 9. Chen [1992] proved that this scheme reduces the variance Definition 2 (Type System). Let S = (N; E) be an EST, of the estimation scheme. The nodes in A form a sampled subtree denoted ST . T = ft1; : : : ; tng is a type system for S if it is a disjoint partitioning of N. If s 2 N and t 2 T with s 2 t, we also Clearly, SS using a perfect type system would produce an write T (s) = t. exact prediction in a single probe. In the absence of that we treat '^(S) as a random variable; then, if E[' ^(S)] = '(S) , Definition 3 (Perfect type system). A type system T is per- we can approximate '(S) by averaging '^(p) over multiple fect for a tree S if for any two nodes n and n in S, if 1 2 sampled probes. And indeed, Chen [1992] proved the fol- T (n ) = T (n ), then the two subtrees of S rooted at n and 1 2 1 lowing theorem. n2 have the same value of '. [ ] Definition 4 (Monotonic type system). [Chen, 1992] A type Theorem 1. Chen, 1992 Given a set of independent samples (probes), p1; :::pm from a search tree S, and given a system is monotonic for S if it is partially ordered such that a m monotonic type system T , the average 1 P '^(pj )(S) con- node’s type must be strictly greater than its parent’s type. m j=1 verges to '(S). SS’s prediction scheme for '(S) generates samples from S , called probes. Each probe p is described by a set A of p The stable children property. A hidden assumption made representative/weight pairs hs; wi, where s is a representative by SS is that it can access the child nodes in the EST of every for the type T (s) and w captures the estimated number of node in EST . SS assumes that child nodes are pruned only nodes of that type in S. For each probe p and its associated if their f-value is greater than or equal to initial upper bound set A a prediction can be computed as: p cb, which is accurate for algorithms such as IDA* [Lelis et X ] [ ] '^(p)(S) = w · z(s) : al., 2013 .

Load more