Finding Diverse and Similar Solutions in Constraint Programming

Finding Diverse and Similar Solutions in Constraint Programming Emmanuel Hebrard Brahim Hnich Barry O’Sullivan Toby Walsh NICTA and UNSW University College Cork University College Cork NICTA and UNSW Sydney, Australia Ireland Ireland Sydney, Australia [email protected] [email protected] [email protected] [email protected] Abstract Parkes, & Roy 1998). Finally, suppose you are trying to ac- It is useful in a wide range of situations to find solutions quire constraints interactively. You ask the user to look at which are diverse (or similar) to each other. We therefore solutions and propose additional constraints to rule out in- define a number of different classes of diversity and simi- feasible or undesirable solutions. To speed up this process, larity problems. For example, what is the most diverse set it may help if each solution is as different as possible from of solutions of a constraint satisfaction problem with a given previously seen solutions. cardinality? We first determine the computational complexity In this paper, we introduce several new problems classes, of these problems. We then propose a number of practical so- each focused on a similarity or diversity problem associated lution methods, some of which use global constraints for en- with a NP-hard problems like constraint satisfaction. We forcing diversity (or similarity) between solutions. Empirical distinguish between offline diversity and similarity problems evaluation on a number of problems show promising results. (where we compute the whole set of solutions at once) and their online counter-part (where we compute solutions in- Introduction crementally). We determine the computational complexity Computational complexity deals with a variety of differ- of these problems, and propose practical solution methods. ent problems including decision problems (e.g. “Is there a Finally, we present some promising experimental results. solution?”), function problems (e.g. “Return a solution”), and counting problems (e.g. “How many solutions exist?”). Formal background However, much less has been said about problems where we A binary relation R over strings is polynomial-time decid- wish to find a set of solutions that are diverse or similar to able iff there is a deterministic Turing machine deciding each other. For brevity, we shall call these diversity and sim- the language fx; y j (x; y) 2 Rg in polynomial time (Pa- ilarity problems. Existing work in this area mostly focuses padimitriou 1994). A binary relation R is polynomially bal- on finding pairs of solutions that satisfy some distance con- anced iff (x; y) 2 R implies jyj < jxjk for some k.A straint (Bailleux & Marquis 1999; Crescenzi & Rossi 2002; language L belongs to NP iff there is a polynomial-time Angelsmark & Thapper 2004). However, there is a wider decidable and polynomially balanced relation R such that variety of problems that one may wish to consider. L = fx j (x; y) 2 Rg. We let Sol(x) = fy j (x; y) 2 Rg. In product configuration, similarity and diversity prob- We assume that we have some symmetric, reflexive, total lems arise as preferences are elicited and suggestions are and polynomially bounded distance function, ±, between presented to the user. Suppose you want to buy a car. There strings. For example, if L is SAT, this might be the Ham- are various constraints on what you want and what is avail- ming distance between truth assignments. Note, however, able for sale. For example, you cannot buy a cheap Ferrari that most of our complexity results will hold for any func- nor a convertible with a sunroof. You begin by asking for tion that is polynomially bounded. Finally, we use 0 to de- a set of solutions as diverse as possible. You then pick the note FALSE and 1 to denote TRUE. most preferred car from this set. However, as not all the details are quite right, you ask to see a set of solutions as Offline diversity and similarity similar as possible to this car. As a second example, suppose you are scheduling staff in We define two decision problems at the core of diver- a hospital. Unfortunately, the problem is very dynamic and sity and similarity problems. These ask if there is a sub- uncertain. People are sure to phone in ill, and unforseen op- set of solutions of size k at least (or at most) d distance apart. For brevity, we define max(±; S) = max ±(y; z) and erations to be required. To ensure that the schedule is robust y;z2S to such changes and can be repaired with minor disruption, min(±; S) = min ±(y; z). you might look for a schedule which has many similar solu- y;z2S tions nearby. Supermodels are based on this idea (Ginsberg, dDISTANTkSET (resp. dCLOSEkSET) Copyright °c 2005, American Association for Artificial Intelli- Instance. Given a polynomial-time decidable and gence (www.aaai.org). All rights reserved. polynomially balanced relation R, a distance function AAAI-05 / 372 ± and some string x. Computational complexity Question. Does there exist S with S ⊆ Sol(x), For some of these problem classes, specifically those for jSj = k and min(±; S) ¸ d (resp. max(±; S) · d). which the CSP is an input (e.g. dDISTANT/CLOSEkSET), We might also consider the average distance between so- the size of the output (a set of solutions) may not be poly- lutions instead of the minimum or maximum. As we have nomially bounded by the size of the input. Therefore we two parameters, d and k, we can choose to fix one and op- will always make the assumption that k is polynomial in the timize the other. That is, we can fix the size of the set of size of the input. With such an assumption, dDISTANTkSET solutions and maximize their diversity (similarity). Alterna- and dCLOSEkSET are in NP. Since dDISTANTkSET and tively, we can fix the diversity (similarity) and maximize the dCLOSEkSET decide membership in R when k = 1, both size of the set of solutions returned. Alternatively, we could problems are trivially NP-complete. We now show that both MAXDIVERSEkSET and look for a Pareto optimal set or combine d and k into a single NP [log n] metric. MAXSIMILARkSET are FP -complete. This is the class of all languages decided by a polynomial-time oracle MAXDIVERSEkSET (resp. MAXSIMILARkSET) which on input of size n asks a total number of O(log n) NP Instance. Given a polynomial-time decidable and queries (Papadimitriou 1994). polynomially balanced relation R, a distance function Theorem 1 MAXDIVERSEkSET and MAXSIMILARkSET ± and some string x. are FPNP [log n]-complete. Question. Find S with S ⊆ Sol(x), jSj = k, and for all S0 with S0 ⊆ Sol(x), jS0j = k, min(±; S) ¸ min(±; S0) (resp. max(±; S) · max(±; S0)). Proof. MAXDIVERSEkSET (resp. MAXSIMILARkSET) 2 FPNP [log n]: The distance function is polynomial in MAXdDISTANTSET (resp. MAXdCLOSESET) the size of the input, n. Using binary search, we call Instance. Given a polynomial-time decidable and dDISTANTkSET (resp. dCLOSEkSET) to determine the ex- polynomially balanced relation R, a distance function act value of d. This requires only logarithmically many ± and some string x. adaptive NP queries. Question. Find S with S ⊆ Sol(x), min(±; S) ¸ Completeness: We sketch a reduction from MAX CLIQUE 0 d (resp. max(±; S) · d), and for all S with IZE NP [log n] 0 0 0 S which is FP -complete (Papadimitriou 1994). min(±; S ) ¸ d (resp. max(±; S ) · d), jSj ¸ jS j. Given a graph G = hV; Ei, with set of nodes V and set of For example, in the product configuration problem, we edges E,MAX CLIQUE SIZE determines the size of the the might want to see the 10 most diverse solutions. This is an largest clique in G. We define a CSP as follows. We intro- instance of the MAXDIVERSEkSET problem. On the other duce a Boolean variable Bi, called a graph-node variable hand, we might want to see all solutions at most 2 features for each node i in V . For each a; b 2 V , if ha; bi 2= E, we different to our current best solution. This is an instance of enforce that Ba and Bb cannot both be 1. We add an addi- d n e + 1 the MAXdCLOSESET problem. tional 2 Boolean variables, called distance variables, and enforce that they are 0 iff every graph-node variable is 0. This CSP always admits the solution with 0 assigned to Online diversity and similarity all Boolean variables. So that MAXDIVERSEkSET finds the In some situations, we may be computing solutions one by clique of maximum size, we define the distance between two one. For example, suppose we show the user several possi- solutions, ±(sa; sb), as the Hamming distance defined over ble cars to buy but none of them are appealing. We would all Boolean variables. Thus, MAXDIVERSEkSET(P ) with now like to find a car for sale which is as diverse from these k = 2 will always have one of the solutions assigning 0 as possible. This suggests the following two online prob- to each Boolean variable and the other solution assigning lems in which we find a solution most distant (close) to 1 to those graph-node variables representing nodes in the a set of solutions. For brevity, we define max(±; S; y) = largest clique possible, and 0 to all the additional distance max ±(y; z) and min(±; S; y) = min ±(y; z). variables, since these two solutions are maximally diverse z2S z2S wrt Hamming Distance. Suppose this is not the case, i.e., MOSTDISTANT (resp. MOSTCLOSE) both solutions assign some node variables to 0 and some Instance. Given a polynomial-time decidable and others to 1.

Finding Diverse and Similar Solutions in Constraint Programming

Computational Complexity: a Modern Approach

The Complexity Zoo

User's Guide for Complexity: a LATEX Package, Version 0.80

The Classes FNP and TFNP

Total Functions in the Polynomial Hierarchy 11

Optimisation Function Problems FNP and FP

On Total Functions, Existence Theorems and Computational Complexity

Dichotomy Theorems for Generalized Chromatic Polynomials

Introduction to the Theory of Complexity

Rettinger.2272.Pdf (0.3

COMPUTATIONAL COMPLEXITY Contents

An Overview of Computational Complexity