Learning Variable Ordering Heuristics for Solving Constraint Satisfaction Problems Wen Song, Zhiguang Cao, Jie Zhang, and Andrew Lim

1 Learning Variable Ordering Heuristics for Solving Constraint Satisfaction Problems Wen Song, Zhiguang Cao, Jie Zhang, and Andrew Lim Abstract—Backtracking search algorithms are often used to strategy. The decision of which variable to select next is solve the Constraint Satisfaction Problem (CSP). The efficiency referred to as variable ordering. It is well acknowledged that of backtracking search depends greatly on the variable order- the choice of variable ordering has a critical impact on the ing heuristics. Currently, the most commonly used heuristics are hand-crafted based on expert knowledge. In this paper, efficiency of backtracking search algorithms [12]. However, we propose a deep reinforcement learning based approach to finding the optimal orderings, i.e. those minimize search effort automatically discover new variable ordering heuristics that are (in terms of number of search nodes, total solving time, better adapted for a given class of CSP instances. We show that etc.), is at least as hard as solving the CSP [13]. Therefore, directly optimizing the search cost is hard for bootstrapping, current practice mainly relies on hand-crafted variable ordering and propose to optimize the expected cost of reaching a leaf node in the search tree. To capture the complex relations among heuristics obtained from the experience of human experts, such the variables and constraints, we design a representation scheme as MinDom [14], Dom/Ddeg [15], and impact-based heuristic based on Graph Neural Network that can process CSP instances [16]. Though they are easy to use and widely adopted, they do with different sizes and constraint arities. Experimental results on not have any formal guarantees on the optimality. In addition, random CSP instances show that the learned policies outperform they are designed for solving any CSP instance without consid- classical hand-crafted heuristics in terms of minimizing the search tree size, and can effectively generalize to instances that ering the domain-specific features, which can be exploited to are larger than those used in training. achieve much better efficiency. However, incorporating these additional features requires substantial experience and deep Index Terms—Constraint Satisfaction Problem, variable ordering, deep reinforcement learning, Graph Neural Network domain knowledge, which are hard to obtain in reality [17]. Recently, Deep Neural Networks (DNNs) have been shown to be promising in learning algorithms for solving NP-hard I. INTRODUCTION problems, such as Traveling Salesman Problem (TSP), Propo- OMBINATORIAL problems [1] are widely studied in sitional Satisfiability Problem (SAT), and Capacitated Vehicle C many research areas, and have numerous real-world ap- Routing Problem (CVRP) [18], [19], [20], [21], [22], [23], plications in domains such as planning and scheduling [2], [3], [24], [25], [26], [27]. The effectiveness comes from the fact vehicle routing [4], [5], graph problems [6], [7], computational that given a class of problem instances (e.g. drawn from a biology [8], [9], etc. As one of the most widely studied distribution), DNN can be trained to discover useful patterns combinatorial problems in computer science and artificial that may not be known or hard to be specified by human intelligence, Constraint Satisfaction Problem (CSP) provides experts, through supervised or reinforcement learning (RL). a general framework for modeling and solving combinatorial In this paper, we ask the following question: can we use problems. A CSP instance involves a set of variables and DNN to discover better variable ordering heuristics for a constraints. To solve it, one needs to find a value assignment class of CSP? This is not a trivial task, due to the following for all variables such that all constraints are satisfied, or prove two challenges. Firstly, given the exponential (worst-case) such assignment does not exist. Despite its ubiquitous appli- complexity of CSP, it is not practical to obtain large amount arXiv:1912.10762v2 [cs.AI] 12 Nov 2020 cations, unfortunately, CSP is well known to be NP-complete of labeled training data (e.g. optimal search paths), therefore it in general [10]. To solve CSP efficiently, backtracking search is hard to apply supervised learning methods. Secondly, CSP algorithms are often employed, which are exact algorithms instances have different sizes and features (e.g. number of vari- with the guarantee that a solution will be found if one exists. ables and constraints, domain of each variable, tightness and Though the worst-case complexity is still exponential, with arity of each constraint). It is crucial to design a representation the help of constraint propagation [11], backtracking search scheme that can effectively process any CSP instance. algorithms often perform reasonably well in practice. To address these challenges, we design a reinforcement In general, a backtracking search algorithm performs depth- learning agent in this paper, which tries to make the optimal first traverse of a search tree, and tries to find a solution by variable ordering decisions at each decision point to minimize iteratively selecting a variable and applying certain branching search cost. Note that here our objective is to minimize the search tree size, which is more suitable for evaluating different W. Song is with the Institute of Marine Science and Technology. Email: [email protected]. ordering heuristics [28]. More specifically, variable ordering in Z. Cao and A. Lim are with the Department of Industrial Systems Engi- backtracking search is modeled as a Markov Decision Process neering and Management, National University of Singapore. Email: isecaoz, (MDP), where the optimal policy is to select at each decision [email protected]. J. Zhang is with the School of Computer Science and Engineering, Nanyang point the variable with the minimum expected search cost. Technological University, Singapore. Email: [email protected] The RL agent can optimize its policy for this MDP by learning 2 from its own experiences of solving CSP instances drawn from respectively). A more promising way is to apply machine a distribution, without the need of supervision. However, such learning within the framework of exact algorithms, such that direct formulation is not convenient for bootstrapping, and the feasibility and solution quality can be guaranteed [38]. A learning must be delayed until backtracking from a search typical exact framework is the branch-and-bound algorithm node. To resolve this issue, we consider the search paths for solving Mixed Integer Linear Programs (MILPs). He et originated from a node as separate trajectories, and opt to al. [39] use imitation learning to learn a control policy for minimize the expected cost of reaching a leaf node. We selecting branches in the branch-and-bound process. Khalil et represent the internal states of the search process based on al. [40] achieves similar purpose by solving a learning-to-rank Graph Neural Network (GNN) [29], which can process CSP task to mimic the behaviors of strong branching. Khalil et al. instances of any size and constraint arity, and effectively [41] also develop a machine learning model to decide whether capture the relationship between the variables and constraints. the primal heuristics should be run for a given branch-and- We use Double Deep Q-Network (DDQN) [30] to train the bound node. These methods are based on linear models with GNN based RL agent. Experimental results on random CSP static and dynamic features describing the current branch-and- instances generated by the well-known model RB [31] show bound status. More recently, Gasse et al. [42] use imitation that the RL agent can discover policies that are better than the learning to mimic strong branching, where the underlying traditional hand-crafted variable ordering heuristics, in terms states are represented using GNN. Similarly, a GNN based of minimizing the search tree size. More importantly, the network is designed in [43], which is trained in a supervised learned policy can effectively generalize to larger instances way to predict values of binary variables in MILP. Though that have never been seen during training. sharing similar GNN structure, our work differs from [42], [43] in mainly two aspects. First, our method does not require labels that are costly to obtain but necessary for imitation or II. RELATED WORK supervised learning. Second, as will be shown latter, we only Recently, there has been an increasing attention on using uses 4 simple raw features, while [42], [43] rely on 19 and 22 deep (reinforcement) learning to tackle hard combinatorial complex MILP features, respectively. (optimization or satisfaction) problems. Quite a few works Another exact framework is the backtracking search algo- in this direction focus on solving specific types of problems, rithms for solving satisfaction problems. Balafrej et al. [44] including routing (e.g. TSP and CVRP) [18], [19], [20], [21], use bandit model to learn a policy that can adaptively select [22], [23], graph optimization [24], [25], packing problems the right constraint propagation levels at each node of a CSP [32], [33], and scheduling [34], [35]. Instead of solving search tree. More close to our work, several methods use tradi- specific problems, we focus on CSP which is a general tional machine learning to choose the branching heuristics for representation of combinatorial problems. solving CSP and some special cases. Lagoudakis and Littman In the literature, a number of methods try to tackle sat- [45] use RL to learn the branching rule selection policy for isfaction problems such as CSP and SAT in an end-to- the #DPLL algorithm for solving SAT, which requires finding end fashion, meaning that training DNN to directly output all solutions for a satisfiable instance. However, as will be a solution for a given instance. Xu et al. [36] represent discussed in Section IV, this RL formulation is not directly binary CSP as a matrix and train a Convolutional Neural applicable for learning in our case. Samulowitz and Memisevic Network (CNN) to predict its satisfiability, but cannot give [46] study the heuristic selection task for solving Quantified the solution for satisfiable instances.

Learning Variable Ordering Heuristics for Solving Constraint Satisfaction Problems Wen Song, Zhiguang Cao, Jie Zhang, and Andrew Lim

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support