Learning to Search in Branch and Bound Algorithms

Learning to Search in Branch-and-Bound Algorithms⇤ He He Hal Daume´ III Jason Eisner Department of Computer Science Department of Computer Science University of Maryland Johns Hopkins University College Park, MD 20740 Baltimore, MD 21218 hhe,hal @cs.umd.edu [email protected] { } Abstract Branch-and-bound is a widely used method in combinatorial optimization, in- cluding mixed integer programming, structured prediction and MAP inference. While most work has been focused on developing problem-specific techniques, little is known about how to systematically design the node searching strategy on a branch-and-bound tree. We address the key challenge of learning an adaptive node searching order for any class of problem solvable by branch-and-bound. Our strategies are learned by imitation learning. We apply our algorithm to linear programming based branch-and-bound for solving mixed integer programs (MIP). We compare our method with one of the fastest open-source solvers, SCIP; and a very efficient commercial solver, Gurobi. We demonstrate that our approach achieves better solutions faster on four MIP libraries. 1 Introduction Branch-and-bound (B&B) [1] is a systematic enumerative method for global optimization of non- convex and combinatorial problems. In the machine learning community, B&B has been used as an inference tool in MAP estimation [2, 3]. In applied domains, it has been applied to the “inference” stage of structured prediction problems (e.g., dependency parsing [4, 5], scene understanding [6], ancestral sequence reconstruction [7]). B&B recursively divides the feasible set of a problem into disjoint subsets, organized in a tree structure, where each node represents a subproblem that searches only the subset at that node. If computing bounds on a subproblem does not rule out the possibility that its subset contains the optimal solution, the subset can be further partitioned (“branched”) as needed. A crucial question in B&B is how to specify the order in which nodes are considered. An effective node ordering strategy guides the search to promising areas in the tree and improves the chance of quickly finding a good incumbent solution, which can be used to rule out other nodes. Unfortunately, no theoretically guaranteed general solution for node ordering is currently known. Instead of designing node ordering heuristics manually for each problem type, we propose to speed up B&B search by automatically learning search heuristics that are adapted to a family of problems. • Non-problem-dependent learning. While our approach learns problem-specific policies, it can be applied to any family of problems solvable by the B&B framework. We use imitation learning to automatically learn the heuristics, free of the trial-and-error tuning and rule design by domain experts in most B&B algorithms. • Dynamic decision-making. Our decision-making process is adaptive on three scales. First, it learns different strategies for different problem types. Second, within a problem type, it can evaluate the hardness of a problem instance based on features describing the solving progress. Third, within a problem instance, it adapts the searching strategy to different levels of the B&B tree and makes decisions based on node-specific features. ⇤This material is based upon work supported by the National Science Foundation under Grant No. 0964681. 1 +∞ node expansion training examples: x = 5/2# order −13/2 y = 3/2 global lower and +∞ +∞ y ≤ 1 y ≥ 2 upper bound −13/3 < −16/3 ub = +∞# x = 5/3# +∞ lb = −16/3 +∞ x = 5/3# optimal node y = 2 +∞ y = 1 −13/3 −16/3 prune: fathomed node −13/3 x ≤ 1 x ≥ 2 x ≤ 1 x ≥ 2 ub = −3# ub = −3# lb = −16/3 lb = −22/5 −3 +∞ x = 1# INF x = 1# INF −3 y = 1 –22/5 y = 12/5 min −2x − y# y ≤ 2 y ≥ 3 ub = −4# s.t. 3x − 5y ≤ 0# lb = −4 3x + 5y ≤ 15# −4 −3 x = 1# x = 0# x ≥ 0, y ≥ 0# y = 2 −4 −3 y = 3 x, y ∈ Z Figure 1: Using branch-and-bound to solve an integer linear programming minimization. • Easy incorporation of heuristics. Most hand-designed strategies handle only a few heuristics, and they set weights on different heuristics by domain knowledge or manual experi- mentation. In our model, multiple heuristics can be simply plugged in as state features for the policy, allowing a hybrid “heuristic” to be learned effectively. We assume that a small set of solved problems are given at training time and the problems to be solved at test time are of the same type. We learn a node selection policy and a node pruning policy from solving the training problems. The node selection policy repeatedly picks a node from the queue of all unexplored nodes, and the node pruning policy decides if the popped node is worth expanding. We formulate B&B search as a sequential decision-making process. We design a simple oracle that knows the optimal solution in advance and only expands nodes containing the optimal solution. We then use imitation learning to learn policies that mimic the oracle’s behavior without perfect information; these policies must even mimic how the oracle would act in states that the oracle would not itself reach, as such states may be encountered at test time. We apply our approach to linear programming (LP) based B&B for solving mixed integer linear programming (MILP) problems, and achieve better solutions faster on 4 MILP problem libraries than Gurobi, a recent fast commercial solver competitive with Cplex, and SCIP, one of the fastest open-source solvers [8]. 2 The Branch-and-Bound Framework: An Application in Mixed Integer Linear Programming Consider an optimization problem of minimizing f over a feasible set , where is usually discrete. F F B&B uses a divide and conquer strategy: is recursively divided into its subsets 1, 2,..., p p F F F F such that = i=1 i. The recursion tree is an enumeration tree of all feasible solutions, whose nodes areF subproblemsF and edges are the partition conditions. Slightly abusing notation, we will use i to refer to bothS the subset and its corresponding B&B node from now on. A (convex) relaxation ofF each subproblem is solved to provide an upper/lower bound for that node and its descendants. We denote the upper and lower bound at node i by ùb( i) and `lb( i) respectively where ùb and `lb are bounding functions. F F A common setting where B&B is ubiquitously applied is MILP. A MILP optimization problem has linear objective and constraints, and also requires specified variables to be integer. We assume we are minimizing the objective function in MILP from now on. At each node, we drop the integrality constraints and solve its LP relaxation. We present a concrete example in Figure 1. The optimization problem is shown in the lower right corner. At node i, a local lower bound (shown in lower half of each circle) is found by the LP solver. A local upper bound (shown in upper part of the circle) is available if a feasible solution is found at this node. We automatically get an upper bound if the LP solution happens to be integer feasible, or we may obtain it by heuristics. B&B maintains a queue of active nodes, starting with a single root node on it. At each step, we pop a node from L using a node selection strategy, and compute its bounds. A node Fi L Fi 2 root solution (problem) Algorithm 1 Policy Learning (⇡S⇤ , ⇡P⇤ ) Yes (1) (1) ⇡ = ⇡⇤ ,⇡ = ⇡⇤ , = , = rank push P P S S DS {} DP {} nodes children for k =1to N do for Q in problem set do No queue (Q) (Q) Q (k) (k) pop empty? S , P COLLECTEXAMPLE(Q, ⇡P , ⇡S ) D D (Q) (Q) S S , P P Yes D D [DS D D [DP Yes No ⇡(k+1),⇡(k+1) fathom? prune? S P train classifiers using S and P (k) (k) D D No return Best ⇡S ,⇡P on dev set Figure 2: Our method at runtime (left) and the policy learning algorithm (right). Left: our policy-guided branch-and-bound search. Procedures in the rounded rectangles (shown in blue) are executed by policies. Right: the DAgger learning algorithm. We start by using oracle policies ⇡S⇤ and ⇡⇤ to solve problems in and collect examples along oracle trajectories. In each iteration, P Q we retrain our policies on all examples collected so far (training sets D and S), then collect additional examples by running the newly learned policies. The COLLECTD EXAMPLED procedure is described in Algorithm 2. is fathomed (i.e., no further exploration in its subtree) if one of the following cases is true: (a) ` ( ) is larger than the current global upper bound, which means all solutions in its subtree can lb Fi not possibly be better than the incumbent; (b) `lb( i)=ùb( i); at this point, B&B has found the best solution in the current subtree; (c) The subproblemF is infeasible.F In Figure 1, fathomed nodes are shown in double circles and infeasible nodes are labeled by “INF”. If a node is not fathomed, it is branched into children of i that are pushed onto . Branching conditions are shown next to each edge in Figure 1. The algorithmF terminates whenL is empty or the gap between the global upper bound and lower bound achieves a specified toleranceL level. In the example in Figure 1, we follow a DFS order. Starting from the root node, the blue arrows points to the next node popped from to be branched. Updated global lower and upper bounds after a node expansion is shown on the boardL under each branched node.

Learning to Search in Branch and Bound Algorithms

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support