<<

Proceedings of the Thirty-First AAAI Conference on (AAAI-17)

Getting More Out of the Exposed Structure in Constraint Programming Models of Combinatorial Problems

Gilles Pesant Ecole´ Polytechnique de Montreal,´ Montreal, Canada CIRRELT, Universite´ de Montreal,´ Montreal, Canada [email protected]

Abstract and of limiting the degree of vertices to be at most two. Taken separately, each of these parts is easy. The field To solve combinatorial problems, Constraint Programming builds high-level models that expose much of the structure of has exploited such structural de- of the problem. The distinctive driving force of Constraint composition for a long time through the concept of prob- Programming has been this direct access to problem struc- lem relaxation which provides bounds on the value of the ture. This has been key to the design of powerful filtering cost function. Lagrangian decomposition provides another but we could do much more. Considering the set opportunity by working on these substructures separately of solutions to each constraint as a multivariate discrete dis- while linking them in the cost function — recent work has tribution opens the door to more structure-revealing computa- adapted this approach to improve communication between tions that may significantly change this solving paradigm. As constraints in CP (Bergman, Cire,´ and van Hoeve 2015; a result we could improve our ability to solve combinatorial Ha, Quimper, and Rousseau 2015). problems and our understanding of the structure of practical A problem to be solved by CP is typically expressed problems. as a Problem (X, D, C) where X = {x1,x2,...,xn} is a finite set of variables, D = Introduction {D1,D2,...,Dn} a corresponding finite set of domains Many efficient computational methodologies have been de- specifying the possible values that each variable in X can signed to solve combinatorial problems. Among those take, and C = {c1,c2,...,cm} a finite set of constraints re- which proceed by building a formal model of the problem, stricting the combinations of values that the variables may SAT and MIP solvers express theirs in a restricted syntax take, which can be seen as relations over subsets of X. The (Boolean clauses, linear constraints) that allows them to exposed structure in CP models, corresponding to each con- apply powerful algorithms and data structures specialized straint cj, has been key to the design of powerful filter- for that homogeneous form. Constraint Programming (CP) ing algorithms (van Hoeve and Katriel 2006) that identify solvers followed a different route, having developed over some variable-value pair (xi,d) which, given the current do- the years a very rich and heterogeneous modelling language mains of the other variables, cannot possibly satisfy cj and (Beldiceanu et al. 2007) with an extensive set of inference hence filter out d from Di. To a lesser degree we also ex- algorithms, each dedicated to a primitive of that language. ploit that structure to guide search (Boussemart et al. 2004; These high-level primitives, termed global constraints, ex- Pesant, Quimper, and Zanarini 2012) and to explain failure pose much of the combinatorial structure of a problem. They and filtering during search (Stuckey 2010). But we could do have been identified over time as being complex enough to much more. provide structural insight yet simple enough for some de- There is a wealth of information inside each of these con- sired computing tasks to remain tractable. As a result one straints. In this paper I present a unified perspective that often finds that an otherwise challenging problem can be generalizes some of the key features of CP and offers novel modelled using just a few such constraints. The distinctive powerful ways to exploit combinatorial problem structure. driving force of CP has been this direct access to problem structure. Solution Sets as Multivariate Distributions NP -hard problems often combine a few “easy” (i.e. Let’s examine the combinatorial structure of individual con- polytime-solvable) problems — it is the combination that straints by considering the set of solutions to a constraint makes them hard to solve. Consider the well-known TSP: as a multivariate discrete distribution. If we wanted to de- it can be seen as the combination of an assignment prob- sign algorithms able to query the distribution of solutions lem (minimum weight matching) and of subtour avoidance; to an instance of a constraint, would that be hard? It turns or alternately as the combination of the minimum span- out that for several combinatorial structures we can achieve ning tree problem (strictly speaking, the minimum one-tree) a lot in polynomial time. What sort of query would we be Copyright c 2017, Association for the Advancement of Artificial interested in? Consider the derived marginal distributions Intelligence (www.aaai.org). All rights reserved. over individual variables. Fig. 1 depicts the simple example

4846 httkstesm au nalsltos h ocp of whose concept distribution, The marginal the solutions. to all in value density solution same the takes a that identifies value single able a is support achieve whose we ex- then and distributions consistency efficiently marginal compute the can such of we actly basis If the forms algorithms. filtering concept This of axis). definition ele- seg- each whose thicker on domain as ments (represented its frequency of non-zero subset have the ments distributions: name- for its of of to sake concept corresponds set precisely core the CP The in of pairs variable-value projection axis). each the onto solutions (i.e. sec- respectively and first variable the the ond over satisfy distributions marginal that the values depicted are of pairs (its the and constraint represent variable, second cells the corre- dark for column grid the value each a possible variable, a of first to the row for corresponds each value possible variables: a to two sponds on constraint a of iue1 ipeeape fcntanso w variables. two on constraints of examples Simple 1: Figure Klye l 05 nastsal S ..avariable a i.e. CSP satisfiable a in 2005) al. et (Kilby 5/16 hog hi uprs agnldistribution marginal A supports. their through 5/22 consistency .O h eto n eo grid a below and of left the On solutions). mode 3/22 3/22 3/22 3/22 3/22 sdi onigbsdsac sequivalent is search counting-based in used 2/6 2/16 2/16 2/16 2/16 2/16 5/16 mode 2/22 1/6 1/6 1/6 1/6 1/16 n t mlmnaina domain as implementation its and 1/6 /63/16 3/16 5/22 2 3743 2/6

2 1 4 mode

5/22 mode 2/16 mode 2/22 8 2/6 4/22 5/16 9 3 2 2 6 mode 5 2/16 4/22 1/6 ..isms fre- most its i.e. 2 1/16 2/22 akoevari- backbone support domain for 4847 too oe,w a en togrlvl fconsistency of levels stronger define within if may we Now variables more), of or subsets domains. (two over individual distributions marginal on consider solely we reflected be work cannot they but 2006), (Bessiere ista antb atrdi niiuldomains. consisten- individual of in levels captured im- stronger be this cannot reflect to that to cies back how come of will issue We portant variables. (or of pairs subsets) for which larger combinations value domains, forbidden variables’ represent shared cannot tradition- is through constraints propagated of from achieveally forms information to new But to algorithms even consistency. efficient rise or levels give more consistency may possibly already-defined investigation and an new such combinato- to chosen Hence our on structures. efficiently rial computed be can that enpooe,sc spt ossec and also consistency have path consistency as of such indi- proposed, levels over been Stronger distributions marginal variables. of a vidual support identify the consistency bounds of Weaker as superset variables. such of consistency domains of if individual levels achieve the can on we operate level we strongest the is consistency Domain epraayi ftedsrbtosmyla oee more even to lead may distributions a the can then of distributions heuristics analysis of search deeper exploitation generic heuris- limited state-of-the-art other on this improve these even of If description a tics). for (see reference problems heuristic two above that on the heuristics of generic performance other Fig- relative to the respect 2012). with show 3 Zanarini and well and 2 Quimper, very ures (Pesant, works cases distributions, many marginal in the from deci- mode est branching that potential showed search the the counting-based rank on distribu- work to Previous a used of sions. be features which should investigate tion can we search For ac fwihn ouin yterrltv ot outline distributions. I the cost. exploit to relative opportunities their further by below solutions impor- weighing the showing of variable, tance are each different 2 very for a most distribution yields at This marginal say, cells). is, (darker each cost interest to whose of cost deemed ones a add the now only we cell, because solution but, as optimiza- middle solutions the of an in set one of same the the more shows context even grid adds bottom the which The cost, In structure. a has solution not. each problem are tion equiv- they appear which constraints rely the we alent, consistency if so domain identical: on their are in solely which not variable, but each modes, 16), for their vs supports in (22 distributions, solutions marginal of the number in respective which the the sets, in in solution reflected different depicted is very have constraints grids two middle and the available. top figure, CP. readily the is in to information concepts Returning discriminating existing more to And correspond already tributions 2012). narini un au,crepnst h rnhn eiinimple- decision heuristic branching CP the by mented to corresponds value, quent ec oefaue rsaitc fteemliait dis- multivariate these of statistics or features some Hence maxSD niiulcntansai opt and path to akin constraints individual ersi,wihsml eomnstehigh- the recommends simply which heuristic, Consistency maxSD Search nbetween in Psn,Qipr n Za- and Quimper, (Pesant, osrit and constraints k k -consistency -consistency                       #$%     $%                   !"   ! "#

Figure 2: Percentage of Magic Square Completion instances Figure 4: Percentage of Balanced Academic Curriculum (Problem 19 of the CSPLib) solved after a given time per in- Problem instances (Problem 30 of the CSPLib) solved to op- stance, for a few heuristics (adapted from (Pesant, Quimper, timality after a given time per instance, for a few heuristics and Zanarini 2012)). (adapted from (Pesant 2016)).

   constraints forming a partition of the variables and multiply-    ing the results, thus obtaining an upper bound (Pesant 2005).      !"  Uniform Sampling   The ability to sample a complex combinatorial space uni-       formly at random has important applications such as in test  case (or stimuli) generation for software and hardware val- idation. Knowledge of the marginal distributions over indi- Figure 3: Percentage of Traveling Tournament Problem with vidual variables allows us to draw samples from the set of Predefined Venues instances (Problem 68 of the CSPLib) solutions uniformly at random by considering one variable solved after a given time per instance, for a few heuristics at a time, deciding which value to assign according to its (adapted from (Pesant, Quimper, and Zanarini 2012)). marginal distribution (viewed as a probability), and then ad- justing the marginal distributions to take into account this assignment. If the combinatorial space is described using robust search. several structures each with its marginal distributions, the Solving combinatorial optimization problems presents the challenge is to adapt this sampling approach in order to pre- additional technical challenge of handling distributions over serve as much as possible the uniformity property. weighted solutions. (Pesant 2016) proposes a way to do this by restricting the distribution to “good” solutions, i.e. that Structural Insights are within some epsilon of a minimum-weight solution. Fig- Starting from the multivariate distribution, which describes ure 4 shows the relative performance of a weight-aware vari- the whole structure of the solution set of a constraint, we can ant of the maxSD heuristic (denoted maxSD ) with respect to take more or less insightful peeks at the structure depending other generic heuristics on one problem (see the above refer- on the nature of our queries. Variable domains are the very ence for details). Another option is to apply some damping flat projections of that distribution on each individual vari- to solutions according to their distance from the optimum, able. Here are a couple of more revealing ideas. making better solutions count more. There is considerable Learning whether a variable’s value is dependent on an- potential here when we are confronted to a well-understood other variable’s value is useful information for search and is combinatorial structure with some side constraints: we can related to the concept of backdoor (Kilby et al. 2005). While recommend branching decisions that correspond to solution one can sometimes distinguish between main and auxiliary fragments that frequently appear in near-optimal solutions variables by simple inspection of the CP model, in general to the “pure” problem (i.e. without the side constraints). identifying such relationships proves much harder. We can quantify the amount of dependency between a pair of vari- Model Counting ables by comparing the marginal distribution over that pair Determining the number of solutions to a CSP (model count- to the product of the marginal distributions over each vari- ing) has a number of applications. Work in enumerative able (a perfect match would indicate variable independence). is typically limited to “clean” combinatorial We already talked about backbone variables, an interest- structures while work in AI tackles model counting by con- ing structural feature but the concept may prove too strict sidering the problem as a whole, e.g. (Ivrii et al. 2016; for CP models with non-binary variables. We can relax that Gogate and Dechter 2008). Another option here is to count concept to quasi-backbone variables, when the marginal dis- exactly or at least closely for each constraint and then to tribution has a very strong mode or a very narrow head. This combine the results, for example by selecting a subset of the may prove more useful in practice.

4848 Querying Distributions to compute marginal distributions over single variables is to All the above may sound like nice lofty goals that we won’t maintain the number of incoming and outgoing paths at ev- achieve. Typical constraints in CP models involve many ery vertex of that graph. A bigger challenge in this case is more variables and exhibit much more intricate structure how to compute on that graph marginal distributions on pairs than the simple examples at Fig. 1. But because we work of variables or even larger subsets. For constraints on which with “easy” structures, quite a few tasks can and indeed have the counting problem is intractable, sampling interleaved been accomplished efficiently. These structures have been with constraint propagation has produced very accurate es- selected in CP because the inference task of reaching certain timates within reasonable time. Some existing upper bounds levels of consistency is computationally tractable. The gam- from the literature, though crude for counting, proved to be ble here is that some of the other queries we are interested in quite accurate for marginal distributions (since we take the are tractable as well. We review next what has been accom- ratio of two upper bounds) and much faster than sampling. plished so far for a few common combinatorial structures. For these two design patterns computing marginal distribu- alldifferent constraint. This constraint restricts a tions over subsets of variables will be more time-consuming set of variables to take pairwise distinct values (Regin´ but should not pose additional technical challenges. Lastly 1994). Simply counting the number of solutions to this for others a transformation to discrete random variables uni- constraint is intractable (#P-complete) because it corre- formly distributed over the range of their domain provides sponds to computing the permanent of an associated 0-1 ma- approximate but fast algorithms. trix, a well-studied problem for which estimators based on sampling and fast upper bounds have been proposed. (Pe- Combining Information sant, Quimper, and Zanarini 2012) describe an adaptation So we have evidence that for several common combinato- of the latter to compute very quickly close approximations rial structures the underlying multivariate distribution can be of the marginal distributions. This idea can be extended to queried efficiently. For some uses, such as consistency, this global cardinality constraints (Regin´ 1996). (Pe- is sufficient since any value filtering from one structure is sant 2016) gives a polytime based on the above valid for the whole CSP. However for others, such as search and on solving a minimum-weight bipartite matching prob- and sampling, a remaining challenge is how best to combine lem to compute marginal distributions of good solutions for the information from each structure. As an example the suc- combinatorial optimization problems. cessful maxSD branching heuristic for search simply takes regular constraint. This constraint expresses patterns the maximum recommendation over all constraints. Should that must be exhibited in a sequence of variables, as spec- we devise more refined combinations e.g. by considering ified by an automaton (Pesant 2004). A slight extension of the way constraints overlap? How can our ability to perform the data structure used by its domain consistency algorithm uniform sampling on component structures be used to draw suffices to compute exact marginal distributions in polyno- samples from the whole CSP and what would be the statisti- mial time (Pesant, Quimper, and Zanarini 2012). (Pesant cal properties of that procedure? 2016) further extends the latter to compute marginal distri- Information from constraints is traditionally propagated butions of good solutions for optimization problems. through shared variables’ domains. A promising avenue to knapsack constraint. This one expresses linear equalities investigate is a richer propagation medium. One way to view and inequalities. Exact marginal distributions are computed a variable’s domain with respect to a constraint is as a set of very similarly to the above but in pseudo-polynomial time; variable-value pairs with non-zero frequency, which is rather alternatively, approximate marginal distributions based on flat information since all values are on an equal footing. If a relaxation take low polynomial time to compute (Pesant, instead we share marginal distributions over individual vari- Quimper, and Zanarini 2012). ables we can discriminate between values from the perspec- spanning tree constraint. The identified subset of tive of each constraint. And if we go one step further and the edges of a given graph must form a spanning tree of allow marginal distributions over pairs of variables it makes it (Dooms and Katriel 2007). Starting from the Matrix- it easier to propagate stronger levels of consistency. Connec- Tree Theorem, the computation of exact marginal distribu- tions to Belief Propagation (Pearl 1982) should be explored tions amounts to matrix inversion (Brockbank, Pesant, and since the nature of the information we propose to share is Rousseau 2013). similar to its messages. dispersion constraint. In order to express balance among a set of variables, the collection of values they take has its mean and its deviation from that mean constrained. Conclusion It is a generalization of the spread and deviation con- Breaking down a complex combinatorial problem into more straints. Exact marginal distributions can be computed in manageable and better understood combinatorial structures polynomial time under reasonable assumptions about the is foundational in CP. Therefore this solving methodology range of the domains (Pesant 2015). is very well positioned to exploit these structures. Focusing Out of the above a few design patterns have emerged so on multivariate discrete distributions offers many opportuni- far. Some constraints already maintain a compact represen- ties. I expect that this deeper investigation will have a signif- tation of the solution set for inference, typically a layered icant impact on our ability to solve combinatorial problems, graph in which there is a one-to-one correspondence be- getting better solutions faster. It should also provide new in- tween paths and solutions. All that is additionally required sights into the combinatorial structure of practical problems.

4849 Acknowledgements Pesant, G.; Quimper, C.-G.; and Zanarini, A. 2012. Financial support for this research was provided by Discov- Counting-Based Search: Branching Heuristics for Con- ery Grant 218028/2012 from the Natural Sciences and Engi- straint Satisfaction Problems. J. Artif. Intell. Res. (JAIR) neering Research Council of Canada. 43:173–210. Pesant, G. 2004. A Regular Language Membership Con- References straint for Finite Sequences of Variables. In Principles and Beldiceanu, N.; Carlsson, M.; Demassey, S.; and Petit, T. Practice of Constraint Programming - CP 2004, 10th Inter- 2007. Global Constraint Catalogue: Past, Present and Fu- national Conference, CP 2004, Proceedings, volume 3258 ture. Constraints 12(1):21–62. of Lecture Notes in , 482–495. Springer. Bergman, D.; Cire,´ A. A.; and van Hoeve, W. 2015. Im- Pesant, G. 2005. Counting Solutions of CSPs: A Structural proved Constraint Propagation via Lagrangian Decomposi- Approach. In IJCAI-05, Proceedings of the Nineteenth In- tion. In Principles and Practice of Constraint Programming ternational Joint Conference on Artificial Intelligence, 260– - 21st International Conference, CP 2015, Proceedings, vol- 265. Professional Book Center. ume 9255 of Lecture Notes in Computer Science, 30–38. Pesant, G. 2015. Achieving Domain Consistency and Springer. Counting Solutions for Dispersion Constraints. INFORMS Bessiere, C. 2006. Constraint Propagation. In Handbook of Journal on Computing 27(4):690–703. Constraint Programming. Elsevier. 29–83. Pesant, G. 2016. Counting-Based Search for Constraint Op- timization Problems. In Proceedings of the Thirtieth AAAI Boussemart, F.; Hemery, F.; Lecoutre, C.; and Sais, L. 2004. Conference on Artificial Intelligence, 3441–3448. AAAI Boosting Systematic Search by Weighting Constraints. In Press. de Mantaras,´ R. L., and Saitta, L., eds., ECAI, 146–150. IOS Press. Regin,´ J. 1994. A Filtering Algorithm for Constraints of Difference in CSPs. In Proceedings of the 12th National Brockbank, S.; Pesant, G.; and Rousseau, L. 2013. Count- Conference on Artificial Intelligence, 362–367. AAAI Press ing Spanning Trees to Guide Search in Constrained Span- / The MIT Press. ning Tree Problems. In Principles and Practice of Con- straint Programming - 19th International Conference, CP Regin,´ J. 1996. Generalized Arc Consistency for Global 2013, Proceedings, volume 8124 of Lecture Notes in Com- Cardinality Constraint. In Proceedings of the Thirteenth Na- puter Science, 175–183. Springer. tional Conference on Artificial Intelligence and Eighth In- novative Applications of Artificial Intelligence Conference, Dooms, G., and Katriel, I. 2007. The ”Not-Too-Heavy Span- AAAI 96, IAAI 96, 209–215. AAAI Press / The MIT Press. ning Tree” Constraint. In Integration of AI and OR Tech- niques in Constraint Programming for Combinatorial Opti- Stuckey, P. J. 2010. Lazy Clause Generation: Combining the mization Problems, 4th International Conference, CPAIOR Power of SAT and CP (and MIP?) Solving. In Integration of 2007, Proceedings, volume 4510 of Lecture Notes in Com- AI and OR Techniques in Constraint Programming for Com- puter Science, 59–70. Springer. binatorial Optimization Problems, 7th International Confer- ence, CPAIOR 2010, Proceedings, volume 6140 of Lecture Gogate, V., and Dechter, R. 2008. Approximate Solution Notes in Computer Science, 5–9. Springer. Sampling (and Counting) on AND/OR Spaces. In Principles and Practice of Constraint Programming, 14th International van Hoeve, W., and Katriel, I. 2006. Global Constraints. In Conference, CP 2008, Proceedings, volume 5202 of Lecture Handbook of Constraint Programming. Elsevier. 169–208. Notes in Computer Science, 534–538. Springer. Ha, M. H.; Quimper, C.; and Rousseau, L. 2015. General Bounding Mechanism for Constraint Programs. In Princi- ples and Practice of Constraint Programming - 21st Inter- national Conference, CP 2015, Proceedings, volume 9255 of Lecture Notes in Computer Science, 158–172. Springer. Ivrii, A.; Malik, S.; Meel, K. S.; and Vardi, M. Y. 2016. On Computing Minimal Independent Support and its Applica- tions to Sampling and Counting. Constraints 21(1):41–58. Kilby, P.; Slaney, J. K.; Thiebaux,´ S.; and Walsh, T. 2005. Backbones and Backdoors in Satisfiability. In Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial In- telligence Conference, 1368–1373. AAAI Press / The MIT Press. Pearl, J. 1982. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach. In Proceedings of the National Conference on Artificial Intelligence, 133–136. AAAI Press.

4850