ConjunctiveQuery Containment and Constraint

Satisfaction

y z

Phokion G Kolaitis Moshe Y Vardi

Computer Science Department Department of Computer Science

University of California Santa Cruz Rice University

Santa Cruz CA USA Houston TX USA

kolaitiscseucscedu vardicsriceedu

Abstract

Conjunctivequery containment is recognized as a fundamental problem in database

query evaluation and optimization At the same time constraint satisfaction is recog

nized as a fundamental problem in articial intelligence What do conjunctivequery

containment and constraint satisfaction have in common Our main conceptual con

tribution in this pap er is to p oint out that despite their very dierent formulation

conjunctivequery containment and constraint satisfaction are essentially the same

problem The reason is that they can b e recast as the following fundamental algebraic

problem given two nite relational structures A and B is there a homomorphism

h A ! B As formulated ab ove the homomorphism problem is uniform in the sense

that b oth relational structures A and B are part of the input By xing the structure

B one obtains the following nonuniform problem given a nite relational structure

A is there a homomorphism h A ! B In general nonuniform tractability results

do not uniformize Thus it is natural to ask which tractable cases of nonuniform

tractability results for constraint satisfaction and conjunctivequery containment do

uniformize

Our main technical contribution in this pap er is to show that several cases of

tractable nonuniform constraint satisfaction problems do indeed uniformize We ex

hibit three nonuniform tractability results that uniformize and thus give rise to

p olynomialtime solvable cases of constraint satisfaction and conjunctivequery con

tainment We b egin by examining the tractable cases of Bo olean constraintsatisfaction

problems and show that they do uniformize This can b e applied to conjunctivequery

containment via Booleanization in particular it yields one of the known tractable

cases of conjunctive query containment After this we show that tractability results

for constraintsatisfaction problems that can b e expressed using programs

A preliminary version of this pap er app eared in Proc th ACM Symp on Principles of Database

Systems June pp

y

Partially supp orted by NSF grant CCR

z

Partially supp orted by NSF grants CCR and CCR URL

httpwwwcsriceedu vardi

with b ounded number of distinct variables also uniformize Finally we establish that

tractability results for queries with b ounded treewidth uniformize as well via a con

nection with rstorder logic a with b ounded number of distinct variables

Introduction

Conjunctive queries have had a conspicuous presence in b oth the theory and practice of

database systems since the s Conjunctive queries constitute a broad class of frequently

used queries b ecause their expressive p ower is equivalent to that of the SelectJoinPro ject

queries in AHV Ull For this reason several algorithmic problems

concerning conjunctive queries have b een investigated in depth In particular conjunctive

query containment was recognized fairly early as a fundamental problem in database query

evaluation and optimization Indeed conjunctivequery containment is essentially the same

problem as conjunctive query evaluation moreover conjunctivequery containment can b e

used as a to ol in query optimization since query equivalence is reducible to query contain

ment

Chandra and Merlin CM studied the computational complexity of conjunctive query

containment and showed that it is an NPcomplete problem In recent years there has

b een renewed interest in the study of conjunctive query containment b ecause of its close

relationship to the problem of answering queries using materialized views LMSS RSU

The latter has emerged as a central problem in integrating information from heterogeneous

sources an area that lately has b een the fo cus of concentrated research eorts see Ull for

survey Since conjunctivequery containment is intractable in its full generality researchers

have embarked on a search for tractable cases These are obtained by imp osing syntactic

or structural restrictions on the conjunctive queries Q and Q that serve as input to the

1 2

problem is Q Q In particular Saraiya Sar showed that conjunctivequery

1 2

containment can b e solved in linear time if every database predicate o ccurs at most twice

in the b o dy of Q More recently Chekuri and Ra jaraman CR CR showed that for

1

every k conjunctivequery containment can b e solved in p olynomial time if Q has

2

querywidth at most k and a query decomposition of Q of width k is available The concept

2

of querywidth is closely related to the wellstudied concept of treewidth of a graph see

vL Bo d It should b e noted that queries of width are precisely the acyclic queries

thus Chekuri and Ra jaramans results extend the earlier work of Yannakakis Yan and

Qian Qia on query evaluation and containment for acyclic queries

Starting with the pioneering work of Montanari Mos researchers in articial intelli

gence have investigated a class of combinatorial problems that b ecame known as constraint

satisfaction problems CSP The input to such a problem consists of a set of variables

a set of p ossible values for the variables and a set of constraints b etween the variables

the question is to determine whether there is an assignment of values to the variables that

satises the given constraints The study of constraint satisfaction o ccupies a prominent

place in articial intelligence b ecause many problems that arise in dierent areas can b e

mo deled as constraintsatisfaction problems in a natural way these areas include Bo olean

satisability temp oral reasoning b elief maintenance machine vision and scheduling see

Dec Kum Mes Tsa In its full generality constraint satisfaction is a NP

complete problem For this reason researchers in articial intelligence have pursued b oth

heuristics for constraintsatisfaction problems and tractable cases obtained by imp osing re

strictions on the constraints see MF Dec PJ

What do conjunctivequery contaiment and constraint satisfaction have in common De

spite their very dierent formulation it turns out that conjunctivequery containment and

constraint satisfaction are essentially the same problem The reason is that they can b e recast

as the following fundamental algebraic problem given two nite relational structures A and

B is there a homomorphism h A B Indeed on the side of conjunctivequery contain

D D

ment it is well known that Q Q if and only if there is a homomorphism h Q Q

1 2

2 1

D

is the canonical database asso ciated with the query Q i CM On where Q

i

i

the side of constraint satisfaction a p erusal of the literature reveals that all constraint

satisfaction problems studied can b e viewed as sp ecial cases of the ab ove homomorphism

problem FV FV see also Jea It should b e noted that several researchers includ

ing Bib Dec GJC PJ have observed that there are tight connections b etween

constraintsatisfaction problems and certain problems in relational databases In particular

Gyssens Jeavons and Cohen GJC p ointed out that the set of all solutions to a constraint

satisfaction problem coincides with the of certain relations extracted from the given

constraintsatisfaction problem Thus solving constraintsatisfaction problems and evaluat

ing joins are interreducible In turn conjunctivequery evaluation and join valuation are also

reducible to each other Ull Since Chandra and Merlin CM showed that conjunctive

query evaluation and conjunctive query containment are equivalent problems this provides

a dierent although less direct way to establish the tight connection b etween conjunctive

query containment and constraint satisfaction Nonetheless it is fair to say that overall there

is little interaction b etween the community that pursues tractable cases of conjunctive query

containment and the community that pursues tractable cases of constraint satisfaction One

of our main goals in this pap er is to make the connection b etween conjunctive query contain

ment and constraint satisfaction explicit bring it to front stage and thus further enhance

the interaction b etween and articial intelligence

As formulated ab ove the homomorphism problem is uniform in the sense that b oth re

lational structures A and B are part of the input By xing the structure B one obtains

the following nonuniform problem CSPB given a nite relational structure A is there

a homomorphism h A B Over the past twenty years researchers in computational

complexity have studied such nonuniform problems in an attempt to determine for which

structures B the asso ciated CSPB problem is tractable and for which it is intractable

The rst remarkable success on this front was obtained by Schaefer Sch who pinp ointed

the computational complexity of Boolean CSP B problems in which the structure B is

Bo olean ie has the set f g as its universe Schaefer established a dichotomy theorem

for Bo olean CSPB problems Sp ecically he identied six classes of Bo olean structures

and showed that CSP B is solvable in p olynomial time if B is in one of these classes but

CSP B is NPcomplete in all other cases Note that each Bo olean CSPB problem can

b e viewed as a generalized satisability problem In particular Schaefers Sch dichotomy

theorem provides a coherent explanation for the computational complexity of Horn Sat

isability Satisability OneinThree Satisability and other such Bo olean satisability

problems After this Hell and NesetrilHN established a dichotomy theorem for CSPB

problems in which B is an undirected graph if B is colorable then CSP B is solvable

in p olynomial time otherwise CSP B is NPcomplete Observe that if K is a clique with

k

k no des then CSP K is the k Colorability problem k Thus Hell and Nesetrils

k

dichotomy theorem generalizes the results concerning the computational complexity of the k

Colorability problem for each k Motivated by these dichotomy results Feder and Vardi

FV raised the question is every CSPB problem either solvable in p olynomial time

or NPcomplete Although they did not settle this question Feder and Vardi FV were

able to isolate two conditions that imply p olynomialtime solvability of CSPB problems

moreover they argued that all known p olynomially solvable CSP B problems satisfy one

of these conditions The rst condition asserts that the complement of the CSPB problem

at hand is expressible in Datalog CSP B itself cannot b e expressible in Datalog b ecause

it is not a monotone problem this condition covers such known tractable cases as Horn

Satisability Satisability and Colorability The second condition is grouptheoretic

and covers Schaefers Sch tractable class of ane satisability problems

In general nonuniform tractability results do not uniformize Thus tractability results

for each problem in a collection of nonuniform CSPB problems do not necessarily yield

a tractable case of the uniform constraint satisfaction problem or of the conjunctive query

containment problem The reason is that b oth structures A and B are part of the input to

the constraint satisfaction problem and the running times of the p olynomialtime algorithms

for CSPB may very well b e exp onential in the size of B Thus it is natural to ask

which tractable cases of nonuniform CSP B problems uniformize and give rise to uniform

tractable cases of constraint satisfaction and equivalently to conjunctivequery containment

Our main technical contribution in this pap er is to show that several cases of tractable

nonuniform CSP B problems do indeed uniformize We b egin by examining the main

tractable cases of Bo olean CSP B problems considered by Schaefer Sch These are the

cases where CSPB corresp onds to a satisability problem a Horn satisability problem

a dual Horn satisability problem or an ane satisability problem For all these cases

uniform p olynomialtime algorithms can b e obtained by combining p olynomialtime algo

rithms that detect membership in these cases build a corresp onding Bo olean formula and

apply the p olynomialtime algorithm for satisability of such formulas It should b e p ointed

out however that the formulabuilding algorithms for Satisability Horn satisability

and dual Horn satisability are in the worst case quadratic in the size of B In turn this

yields cubictime algorithms for the corresp onding uniform constraintsatisfaction problems

We show here that a b etter b ound can b e achieved by designing algorithms that skip the

formulabuilding phase Although these results are ab out Bo olean constraintsatisfaction

problems they turn out to have applications to conjunctivequery containment For this we

show that conjunctivequery containment problems can b e binarized and reduced to Bo olean

constraintsatisfaction problems As a concrete application we show that Sarayias Sar

tractable case of conjunctivequery containment can b e derived using this technique

After this we fo cus on the connections b etween Datalog and constraint satisfaction As

mentioned earlier Feder and Vardi FV FV realized that the tractability of many non

uniform CSPB problems can b e globally explained by the fact that the complement of

each of these problems is expressible in Datalog Using p ebblegame techniques introduced

in KV we show here that such nonuniform tractability results uniformize as long as

Datalog programs with a b ounded number of distinct variables are considered Sp ecically

we establish that for every k there is a p olynomialtime algorithm for testing whether

there is a homomorphism h A B where A and B are two given relational structures

such that the complement of CSPB is expressible by a Datalog program with at most k

distinct variables in each rule

Up to this p oint we have obtained tractable cases of the constraint satisfaction problem

is there a homomorphism h A B by imp osing restrictions on the structure B At

the level of conjunctivequery containment Q Q this amounts to imp osing restrictions

1 2

on the query Q Our last result yields a tractable case of conjunctivequery containment

1

by imp osing restrictions on the query Q it uniformizes another nonuniform tractability

2

result for CSPB problems in FV Sp ecically we consider queries of b ounded treewidth

and establish that for every k there is a p olynomialtime algorithm for testing whether

Q Q where Q is a conjunctive query of treewidth at most k For this we show that

1 2 2

k

every conjunctive query of treewidth k is expressible in FO the fragment of rstorder

logic with at most k distinct variables and then apply a p olynomialtime algorithm for

k

evaluating FO queries See Section for a discussion of related work

Preliminaries

Formally a nary conjunctive query Q is a query denable by a p ositive existential rst

order formula X X having conjunction as its only Bo olean connective that is by

1 n

a formula of the form

Z Z X X Z Z

1 m 1 n 1 m

where X X Z Z is a conjunction of extensional database predicates The

1 n 1 m

free variables X X of the denining formula are called the distinguished variables of

1 n

Q Such a conjunctive query is usually written as rule whose head is QX X and

1 n

whose b o dy is X X Z Z For example the formula

1 n 1 m

Z Z P X Z Z RZ Z RZ X

1 2 1 1 2 2 3 3 2

denes a conjunctive query Q which as a rule b ecomes

QX X P X Z Z RZ Z RZ X

1 2 1 1 2 2 3 3 2

If D is a database then QD is the nary on D obtained by evaluating the query Q

on D that is the collection of all ntuples from D that satisfy the query Note that we need

to choose an order for the free variables In the example ab ove we chose the order X X

1 2

but the order X X is also acceptable For example we can write the query as the rule

2 1

QX X P X Z Z RZ Z RZ X

2 1 1 1 2 2 3 3 2

Let Q and Q b e two nary queries having the same tuple of distinguished variables

1 2

If Q D Q D for every database D we say that Q is contained in Q and write

1 2 1 2

Q Q The conjunctivequery containment problem asks given two conjunctive queries

1 2

Q and Q is Q Q

1 2 1 2

It is well known that conjunctivequery containment can b e reformulated as a conjunctive

query evaluation problem and also as a homomorphism problem The link to these two other

Q

problems is via the canonical database D asso ciated with Q This database is dened as

Q

follows Each variable o ccurring in Q is considered a distinct element in D Every predicate

Q

in the b o dy of Q is a predicate of D as well moreover for every distinguished variable X

i

of Q there is a distinct unary predicate P not o ccurring in Q As regards the facts of

i

Q

D every subgoal in the b o dy of Q gives rise to a tuple in the corresp onding predicate

Q Q

of D and if X is a distinguished variable of Q then P X is a fact of D Thus in

i i i

the example ab ove the canonical database consists of the facts P X Z Z RZ Z

1 1 2 2 3

RZ X P X P X Recall that a homomorphism b etween two relational structures

3 2 1 1 2 2

A

A and B over the same vocabulary is a mapping h A B such that if c c P

1 k

B A

then hc hc P where P is any predicate symbol in the vocabulary and P

1 k

B

and P are the interpretations of P on A and B The relationship b etween conjunctive

query containment conjunctive query evaluation and homomorphisms is provided by the

following theorem

Theorem CM Let Q and Q be two nary conjunctive queries having the same

1 2

tuple of distinguished variables Then the fol lowing statements are equivalent

Q Q

1 2

Q

1

where X X is the tuple of the distinguished variables X X Q D

1 n 1 n 2

of Q

1

Q Q

2 1

There is a homomorphism h D D

D

Note that every database D gives rise to a Bo olean conjunctive query Q whose b o dy con

sists of the conjunction of all facts in D where we view the elements of the databases as

existentially quantied variables In turn this makes it p ossible to show that b oth con

junctive query evaluation and the existence of homomorphism b etween two nite relational

structures are reducible to conjunctive query containment In particular there is a homo

B A

morphism h A B if and only if Q Q

Let us now fo cus on the constraintsatisfaction problem As mentioned earlier this

problem is usually formulated as the question do es there exists an assignment of p ossible

values to given variables so that certain constraints are satised Instead we will consider

an alternate elegant formulation in terms of homomorphisms Let A and B b e two classes

of nite relational structures The uniform constraintsatisfaction problem CSPA B

is the following decision problem given a structure A A and a structure B B is

there a homomorphism h A B Note that by its very denition each CSPA B

problem is in NP We write CSPB for the sp ecial uniform case CSP A B in which A is

the class of all nite relational structures over the vocabulary of B If B consists of a single

structure B then we write CSPA B instead of CSPA fB g We refer to such problems

as nonuniform constraint satisfaction problems b ecause the inputs are just structures A

in A We also write CSPB for the sp ecial nonuniform case CSP A B in which A is

the class of all nite relational structures over the vocabulary of B Note that if B is a

Boolean structure ie it has f g as its universe then CSP B is a generalized satisability

problem in the sense of Schaefer Sch see also GJ LO page For example if

B f g f g then CSPB is equivalent to Positive OneinThree

SAT Thus CSP B may very well b e an NPcomplete problem

We are interested in identifying classes A and B such that CSPA B is solvable in

p olynomial time Such classes give rise to tractable cases of the constraint satisfaction

problem and hence of the conjunctive query containment problem as well For the past

twenty years researchers in computational complexity have investigated CSPB problems

and have discovered several p olynomialtime cases As a general rule however nonuniform

tractable results do not uniformize Indeed it is not hard to construct classes A and B of

nite relational structures such that CSP A B is NPcomplete but for each B B the

nonuniform CSP A B problem is solvable in p olynomial time For example let K b e the

class of all nite cliques and let G b e the class of all nite undirected graphs It is clear that

CSP K G is NPcomplete since it is equivalent to the Clique problem For every xed

nite undirected graph G however one can determine in a constant number of steps whether

G has a clique of size k This example is not isolated since other NPcomplete problems

can b e viewed this way In particular if P is the class of all nite paths then CSP P G

is equivalent to the Hamiltonian Path Problem whereas for every nite graph G there is a

lineartime algorithm for CSPP G These negative results notwithstanding in the sequel

we will establish that several interesting nonuniform tractable cases do uniformize and give

rise to tractable cases of constraint satisfaction and conjunctive query containment

Bo olean Constraint Satisfaction

Schaefer studied the computational complexity of Bo olean CSP B problems for which

he established a dichotomy Sch More sp ecically he identied six classes of Bo olean

structures and showed that CSP B is solvable in p olynomial time if B is in one of these

classes but CSPB is NPcomplete in all other cases This classication is in terms of

dening formulas A k ary Bo olean relation R can b e viewed as a set of truth assignments

on the prop ositional variables p p Thus for each k ary Bo olean relation R there is

1 k

a prop ositional formula over the variables p p such that R model s We call

R 1 k R

a dening formula of R and we say that R is denable by Schaefer showed that for

R R

a Bo olean structure B we have that CSPB is in PTIME if one of the following six cases

holds

each relation in B contains the tuple h i

each relation in B contains the tuple h i

each relation in B is Horn ie denable by a CNF formula with at most one p ositive

literal p er clause

each relation in B is dual Horn ie denable by a CNF formula with at most one

negative literal p er clause

each relation in B is bijunctive ie denable by a CNF formula

1

each relation in B is ane ie denable by a conjunction of linear equations

Furthermore Schaefer established that if B is not in any of these six classes then CSPB

is NPcomplete

We say that a Bo olean structure B is a Schaefer structure if B is in at least one of

the ab ove six classes in which case CSP B is solvable in p olynomial time We call the

class of all Schaefer structures Schaefers class denoted SC Our main result in this section

is that CSP SC is solvable in p olynomial time which means that Schaefers tractability

results completely uniformize As a rst step we need to show that structures in SC can b e

recognized in p olynomial time This follows from results in DP Sch

Theorem The class SC is recognizable in polynomial time

Pro of The rst two cases are trivially recognizable Schaefer showed that a Bo olean

relation R is bijunctive if and only if the following condition holds if t t t R then

1 2 3

t t t t t t R here Bo olean op erations are applied to tuples comp onentwise

1 2 2 3 1 3

In addition Schaefer showed that a Bo olean relation R is ane if and only if the following

condition holds if t t t R then t t t R Finally Dechter and Pearl DP

1 2 3 1 2 3

showed that a Bo olean relation R is Horn resp dual Horn if and only if the following

2

condition holds if t t R then t t R resp t t R Clearly each of these

1 2 1 2 1 2

conditions can b e checked in p olynomial time

We say that a relation R is a trivial Schaefer relation if it is covered by the rst two cases

of Schaefers classication and we say that R is a nontrivial Schaefer relation if it is covered

by the four interesting cases of Schaefers classication ie Horn dual Horn bijunctive

and ane In the latter cases the relation R is denable by a formula with a certain

R

syntactical structure The next step is to show that given a nontrivial Schaefer relation R

we can construct a dening formula in p olynomial time

R

Theorem There is a polynomial algorithm that constructs for each nontrivial Schaefer

relation R a dening formula

R

Pro of There are four cases to consider Dechter and Pearl DP showed how to construct

in p olynomial time when R is Horn or dual Horn It remains to deal with the cases that

R

R is bijunctive or ane

Let R b e a k ary bijunctive relation Then there is a CNF formula over the prop osi

tional variables fp p g such that R model s If c is a clause over p p

1 k 1 k

we say that R satises c denoted R j c if R model sc Consider the formula

V

c where the conjunction ranges over all clauses c over fp p g We

1 k R

Rj=c

claim that R model s and consequently is a dening formula of R Clearly

R R

R model s Moreover if c is a conjunct in then R satises c Thus c is also a

R

conjunct of and so model s model s R which implies that R model s

R R R

2

Clearly can b e constructed in time O jjRjj k

R

1

true p p false or p p p A linear equation is a formula of the form p

i i i i i i

2 1 l 2 1

2

For precursors of this result see McK Riv

Let R b e a k ary ane relation Note that every linear formula p p p false

i i i

1 2

l

resp true can b e viewed as the equation p p p resp over the

i i i

1 2

l

Bo olean eld Let R ft j t Rg Each linear equation satised by R corresp onds

to a Bo olean k vector a a a such that a t a t for each

1 k +1 1 1 k +1 k +1

0

of R that is t t t R Thus the set of such vectors a is the nullspace N

1 k +1 R

the vector space of solutions to the homogeneous linear equation system R a over the

Bo olean eld where R is viewed as a Bo olean jRj k matrix note that jRj jR j

0

is at By the Fundamental Theorem of Linear Algebra the dimension of the space N

R

most mink jRj By Gaussian elimination we can convert R to a rowechelon matrix in

0

whose size is at most mink jRj KW Each p olynomial time and obtain a basis of N

R

vector a a a in the basis corresp onds to a linear formula p p p

1 k +1 i i i

1 2

l

false or true that is satised by R We claim that the conjunction of these formulas

R

constitutes a dening formula of R Clearly R model s Moreover we already observed

R

0

Thus a can that each linear equation e satised by R corresp onds to a vector a in N

e e R

b e obtained as a linear combination of basic vectors in other words e is a consequence of

and so model s R

R R

We can now prove the main result of this section

Theorem CSPSC is solvable in polynomial time

Pro of Supp ose we are given a pair A B of relational structures where B S C We have

to determine whether there is a homomorphism from A to B By Theorem we can

determine in p olynomial time which of the six tractable cases in Schaefers classication

describ es B If B is a trivial Schaefer structure then there is a homomorphism from A to

B so we can assume that B is a nontrivial Schaefer structure For each k ary relation Q in

A let Q b e the corresp onding relation in B ie Q and Q are the interpretations of the

0 0

is a formula over recall that same relation symbol Apply Theorem to construct

Q Q

fp p g

1 k

We can view each element of A as a prop ositional variable For a tuple t t t

1 k

0 0

by substituting t for p i k Let t b e the formula obtained from Q let

i i Q Q

V V

0 0

j Let t Note that the length of is O jQjj We

Q A Q Q Q Q

tQ QA

claim that there is a homomorphism from A to B precisely when is satisable Indeed

A

supp ose that there is a homorphism h A B Consider the truth assignment dened

by t ht for each element t of a tuple t Q Cho ose a sp ecic tuple t Q As

i i i

0

so ht Q the truth assignment dened by p ht satises the formula

i i Q

0

t It follows that satises Conversely supp ose that the truth assignment satises

Q Q

satises Dene the homomorphism ht p for each element t of a tuple t Q

Q i i i

0

t the truth assignment dened by Cho ose a sp ecic tuple t Q As satises

Q

p t satises It follows that ht Q Note however that is a conjunction

i i A

Q

of Horn clauses dual Horn clauses clauses or linear formulas dep ending on the type of

B Thus satisability of can b e checked in time that is linear in the length of in

A A

the rst three cases BB DG Pap and cubic in the length of in the fourth case

A

Sch

When R is an ane relation the length of the dening formula constructed ab ove is

R

b ounded by the size of R In contrast if R is bijunctive Horn or dual Horn then the length

2

of is prop ortional to O k where k is the arity of R which can b e quite larger than

R

the size of R Thus the complexity of our algorithm is cubic in these cases It is p ossible

however to skip the formulabuilding stage of our algorithm and design a direct algorithm

that essentially tests for satisability of without explicitly constructing it

A

Theorem Let B be the class of Horn dual Horn or bijunctive structures Then CSP B

is solvable in quadratic time

Pro of We rst describ e the algorithm for the Horn case an analogous algorithm works for

the dual Horn case Let R b e a k ary Horn relation Take k f k g For X k and

V

p p To determine if there is j k we say that R satises X j if R model s

i j

iX

a homomorphism from a structure A to a Horn structure B the algorithm maintains a set

O ne of elements of A that have to b e mapp ed to Initially O ne is empty Let t b e a tuple

in a relation Q of A We dene O net fi j t O neg The algorithm rep eatedly selects a

i

tuple t in a relation Q of A and then adds t to O ne if Q satises O net j where Q is

j

the relation in B that corresp onds to Q When O ne cannot b e enlarged anymore there is

a homomorphism from A to B if and only if for each tuple t in a relation Q of A there is

a tuple t in the corresp onding relation Q of B such that O net O net To prove this

claim note that every element in O ne clearly has to b e mapp ed to Thus this condition

is necessary To see that it is also sucient consider the homomorphism h such ht

i

if t O ne and ht if t O ne We claim that ht Q for each t Q Indeed

i i i

consider the collection T of all tuples t Q such that O net O net We know that T is

V

not empty Let u T ie the conjunction of all tuples in T Since Q is a Horn relation it

is closed under conjunction so u T Q see pro of of Theorem If O net O neu

we are done Otherwise there is some j k such that j O neu O net But then Q

satises O net j which means that the algorithm would have added t to O ne in which

j

case we would have j O net contradiction

We now claim that this algorithm can b e implemented to run in time O jjAjj jjB jj A

2

naive implementation would take time O jjAjj jjB jj since O ne can b e extended at most

jjAjj times and each extension of O ne takes time O jjAjj jjB jj as we have to nd a tuple

t Q requiring an external lo op over all tuples of A and add t to O ne if Q satises

j

O net j requiring an internal lo op over all tuples of B A more ecient implementation

would fo cus on the elements of A that are to b e added to O ne In the prepro cessing stage

we build linked lists that link all o ccurrences in A of an element a When a is added to O ne

we traverse the list for a and pro cess all tuples t in which a o ccur After this we up date

O net and then by scanning B we check whether this triggers the addition of another

element to O ne Thus every o ccurrence of an element of A is visited at most once resulting

in a running time of O jjAjj jjB jj This implementation is inspired by the lineartime

algorithms for Horn satisability BB DG

Consider now the bijunctive case A lineartime algorithm for CNF formulas pro ceeds

in phases LP In each phase we choose an unassigned variable u and assign an arbitrary

truth value to it We then use the binary clauses in the formula to propagate the assignment

If x is assigned and we have a clause x y then y is assigned and if we have a clause

x y then y is assigned Similarly if x is assigned and we have a clause x y then y

is assigned and if we have a clause x y then y is assigned If this results in a variable

z assigned b oth and then we undo all assignments of this phase and we try to assign

to u the other truth value If b oth attempts fail then the formula is unsatisable If either

the rst or the second attempt is successful then we pro ceed to the next phase As each

variable is assigned a truth value at most twice the algorithm is linear

Given the pair A B of structures where B is bijunctive we can emulate the ab ove

algorithm The variables are the elements of A The clauses are implied by the structure B

see pro ofs of Theorems and The algorithm pro ceeds in phases In each phase we

choose an unassigned element a of A and assign to it a value i f g We then use the

structure B to propagate the assignment Supp ose that a t for a tuple t in a relation Q

k

0

of A Let T b e the set of all tuples t in the corresp onding relation Q of B such that

Q k i

0

t i Supp ose now that for some j f g we have that t j for all t T in this

Q k i

k l

case we know that the element t must b e assigned the value j If this propagation results in

l

an element b of A assigned b oth and then we undo all assignments of this phase and we

try to assign the value i to a If b oth attempts fail then there is no homomorphism from

A to B If the rst or second attempt are successful then we pro ceed to the next phase

Note that each element is assigned value at most twice but propagating a value requires

scanning the pairs t t of all tuples t Q Listing comp onents of a tuple without listing

k l

the whole tuple requires prepro cessing the structures to construct the appropriate linked

lists Thus the complexity of our algorithm is O jjAjj jB j jjB jj Note that jjB jj is the

size of the enco ding of B while jB j is the number of tuples in B

What are the implications of Theorem for conjunctivequery containment At rst

sight it seems that its applicability is limited since Bo olean constraintsatisfaction problems

corresp ond to testing whether Q Q where Q uses only two variables corresp onding to

1 2 1

the Bo olean values and and thus seems very restricted Nonetheless the critical obser

vation is that every instance A B of a constraintsatisfaction problem can b e converted

with a small blowup to a Bo olean constraintsatisfaction problem A B by enco ding all

b b

elements of B in binary notation Sp ecically if n is the number of elements in B then we

can enco de every element of B by a bit vector of length m dlog ne Thus a k ary relation

of B Note that since one needs ndlog ne Q of B b ecomes a k mary Bo olean relation Q

b

b

bits to enco de n elements there is essentially no blowup in this conversion We then replace

every element a in A by an mvector ha a i consisting of m distinct copies of a For

1 m

each relation Q of A this yields a k mary relation Q This conversion blows up the size of

b

the instance by a factor of dlog ne where n jB j

Lemma There is a homomorphism from A to B if and only if there is a homomorphism

from A to B

b b

Pro of We can assume that the elements of B are n Supp ose rst that there is a

homomorphism h A B For each element a of A if ha j then dene h a to b e

b i

the ith bit of j for i m It is easy to see that h is a homomorphism from A to

b b

B Supp ose now that there is a homomorphism h A B For each element a of A

b b b b

dene ha to b e the number whose binary notation is hh a h a i It is easy to see

b 1 b m

that h is a homomorphism from A to B

We refer to the pro cess of converting a constraintsatisfaction problem to a Bo olean constraint

satisfaction problem as Booleanization

We now present an application of this technique A twoatom conjunctive query is one

in which every database predicate o ccurs at most twice in the b o dy

Prop osition Sar Testing whether a twoatom conjunctive query Q is contained in

1

a conjunctive query Q can be done in polynomial time

2

Pro of By Lemma we can Bo oleanize the problem and reduce it to testing the existence

of a homomorphism from a structure A to a Bo olean structure B where every relation in

B has at most two tuples Recall that if B has n elements then the conversion increases

the arity of the relations in A by a factor of dlog ne By the criterion for bijunctivity see

the pro of of Theorem every relation in B is indeed bijunctive By Theorem and

Theorem the test can b e done in time O jjQ jj log jjQ jj jjQ jj

2 1 1

It is worth noting that the pro of in Sar yields a slightly b etter upp er b ound as it is shown

there that testing whether a twoatom conjunctive query Q is contained in a conjunctive

1

query Q can b e done in time O jjQ jj jjQ jj

2 1 2

We conclude this section by presenting two examples that provide additional evidence

for the p ower of Bo oleanization

Example Colorability

Let B b e a graph consisting of two no des and a single undirected edge b etween them

It is easy to see that CSP B is the class of all colorable graphs and thus a tractable

constraintsatisfaction problem We now show that this well known tractability result can

b e derived via Bo oleanization Indeed B gives rise to the Bo olean structure B f g R

where R f g This structure is b oth bijunctive since R has cardinality and

ane since R is the set of solutions of x y true Thus Bo oleanization provides two

dierent explanations as to why Colorability is solvable in p olynomial time

Example CSP C

4

Let C b e a directed cycle with four no des that is C fa b c dg E where E

4 4

fa b b c c d d ag If we Bo oleanize C using the lab eling

4

a b c d

then we obtain the Bo olean structure C f g E where

4

E f g

Clearly E is neither valid nor valid Using the criteria in the pro of of Theorem it

can b e easily veried that E is not Horn dual Horn or bijunctive but it is an ane Bo olean

relation For instance E is not Horn resp dual Horn b ecause the comp onentwise resp

of the rst two tuples of E is resp which is not in E Similarly

E is not bijunctive b ecause the comp onentwise ma jority of the rst three tuples of E

is which is not in E Finally E is ane b ecause it is closed by taking the

comp onentwise of arbitrary triples in E Alternatively E can b e seen to b e ane by

observing that E is the set of solutions of the system

x y z false y w true

It follows that CSPC is solvable in p olynomial time Naturally this could also have b een

4

seen directly by observing that CSP C is Colorability in disguise Indeed since homomor

4

phisms comp ose and since C is colorable it is easy to see that there is a homomorphism

4

from a given a directed graph G to C if and only if G is colorable HN

4

It should b e p ointed out that the way Bo oleanization is carried out may give rise to a

Schaefer structure of dierent type Sp ecically we claim that there is a lab eling of C that

4

results into a Bo olean structure that is b oth ane and bijunctive To see this consider the

lab eling

a b c d

The resulting Bo olean structure is B f g E where

E f

We leave it as an exercise for the reader to verify using the criteria in the pro of of Theorem

that E is neither Horn nor dual Horn but it is b oth bijunctive and ane

Datalog and Constraint Satisfaction

A Datalog program is a nite set of rules of the form

t t t

0 1 m

where each t is an atomic formula Rx x The relational predicates that o ccur in the

i 1 n

heads of the rules are the intensional database predicates IDBs while all others are the

extensional database predicates EDBs One of the IDBs is designated as the goal of the

program Note that IDBs may o ccur in the b o dies of rules and thus a Datalog program is a

recursive sp ecication of the IDBs with semantics obtained via least xedp oints of monotone

op erators see Ull Each Datalog program denes a query which given a set of EDB

predicates returns the value of the goal predicate Moreover this query is computable in

p olynomial time since the b ottomup evaluation of the least xedp oint of the program

terminates within a p olynomial number of steps in the size of the given EDBs see Ull

Thus expressibility in Datalog is a sucient condition for tractability of a query

If B is a nite relational structure and A is a class of structures then we write CSP A B

for the complement of CSP A B that is the class of structures A such that there is no

homomorphism h A B Feder and Vardi FV provided a unifying explanation for

the tractability of many nonuniform CSP B problems by showing that the complement

of each of these problems is expressible in Datalog Our aim in this section is to obtain

stronger uniform tractability results for the collections of constraint satisfaction problems

whose complements are expressible in Datalog with a b ounded number of distinct variables

For every p ositive integer k let k Datalog b e the collection of all Datalog programs in

which the b o dy of every rule has at most k distinct variables and also the head of every rule

has at most k variables the variables of the b o dy may b e dierent from the variables of

the head For example the query NonColorability is expressible in Datalog since it is

denable by the goal predicate Q of the following Datalog program

P X Y E X Y

P X Y P X Z E Z W E W Y

Q P X X

It is well known that Datalog can b e viewed as a fragment of least xedp oint logic LFP

see CH AHV In turn on the class of all nite structures LFP is subsumed by

S

k k

L where L is the innitary logic with the nitevariable innitary logic L

k

arbitrary disjunctions and conjunctions but with at most k distinct variables see KV

k

In the present pap er we are interested in fragments of L and L that are suitable for

k

the study of Datalog For every k let L b e the existential positive fragment of L

with k variables that is the collection of all formulas that have at most k distinct variables

and are obtained from atomic formulas using innitary disjunction innitary conjunction

and existential quantication only Let Q b e a query on the class of all nite structures over

a xed vocabulary In KV it was shown that if Q is expressible in k Datalog then Q

0

k

is also denable in L for some k k Moreover in KV it was shown that if Q is

k

expressible in LFP least xedp oint logic with k variables then Q is also expressible in

k

As a matter of fact the pro of can b e adapted to yield the following result which is L

optimal as regards the number of distinct variables used

k

Theorem Let k be a positive integer Every k Datalog query is expressible in L

k

Thus k Datalog L

In what follows we present a selfcontained pro of of the preceding Theorem For this

we rst have to give precise denitions of the concepts involved and establish a number of

intermediate results

k

Let b e a xed relational vocabulary For every k we write FO for the collec

k

tion of all rstorder formulas with at most k distinct variables We also write FO for

k

the existential p ositive fragment of FO ie the collection of all rstorder formulas that

have at most k distinct variables and are obtained from atomic formulas using disjunction

conjunction and existential quantication only

A system of rstorder formulas is a nite sequence

x x S S x x S S

1 1 n 1 l l 1 n 1 l

1

l

of rstorder formulas such that each S is a relation symbol of arity n i l not in the

i i

vocabulary If A is a structure then every such system gives rise to an op erator from

sequences R R of relations R of arity n i l on the universe A to sequences

1 l i i

of relations on the universe of A of the same arities More precisely

R R R R R R

1 l 1 1 l l 1 l

where for every i l

R R fa a A j x a x a S R S R g

i 1 l 1 n i 1 1 n n 1 1 l l

i i i

m m m

The stages m of on a structure A are dened by the

1 l

following induction on m simultaneously for all i l

1 m+1 m m

i l m

i i

i i 1 l

If each formula x x S S i l of a system is p ositive in the relation

i 1 n 1 l

i

symbols S S then the asso ciated op erator is monotone in each of its arguments

1 l

and as a result the sequence of its stages is increasing in each comp onent Thus for every

nite structure A the sequence of stages of converges after nitely many iterations ie

m m

0

there is a p ositive integer m such that for every m m Moreover the

0 0

m m

m

0 0

0

sequence is the least xedp oint of the op erator on A ie the

1

l

smallest sequence R R of relations on A such that R R R R see

1 l 1 l 1 l

AHV We call this sequence the least xedpoint of the system and denote it

1 l

but in Usually one is interested not in the entire sequence by

l 1 l 1

only one of its comp onents for instance in the last comp onent

l

Least xedp oint logic LFP is the extension of rstorder logic that has as formulas the

of systems of p ositive rstorder formulas For every k let comp onents

1 l

i

k

LFP b e the fragment of LFP obtained by taking the comp onents of least xedp oints of

k k

systems of p ositive FO formulas Similarly LFP is the fragment of LFP obtained by

k

taking the comp onets of least xedp oints of systems of p ositive FO formulas

Chandra and Harel CH showed that Datalog has the same expressive p ower as the

existential fragment of LFP More precisely a query is expressible in k Datalog if and only

k

if it is LFP denable In fact every k Datalog program can b e simulated by a system

k

of p ositive FO formulas and vice versa Intuitively every IDB predicate P of gives rise

k

to an FO formula that is the disjunction of the p ositive existential formulas that dene

k

the b o dies of the rules having the IDB predicate P as head The resulting system of FO

formulas simulates the k Datalog program stepbystep that is to say each stage of the

system corresp onds to a stage in the b ottomup evaluation of Consequently to prove

k

k

Theorem suces to establish that LFP which amounts to establishing that L

k

if is a system of p ositive FO formulas then each comp onent of the least

1 l

i

k

xedp oint of this system is L denable

In the sequel we assume that for every k the variables x x are the k distinct

1 k

k

k

variables of the logics FO and L

Lemma Let k be a positive integer let f k g f k g be a function and

let Q be a query

k k

If Q is FO denable then the query Q is also FO denable where for every nite

structure A and every sequence a a of elements from the universe of A

1 k

a a Q A a a QA

1 k

(1) (k )

k k

If Q is L denable then the query Q is also L denable

Pro of We will show that for every function f k g f k g and for every

k k

k

formula x x of FO resp L there is a formula x x of FO resp

1 k 1 k

k

L such that for every structure A and every sequence a a of elements from

1 k

the universe of A

A j x a x a A j x a x a

1 1 k k 1 (1) k (k )

k

k

The pro of is by induction on the construction of FO formulas resp L formulas

simultaneously for all functions

If x x is the formula x x for some i j with i j k then

1 k i j

x x is the formula x x

1 k (i) (j )

If q f r g f k g is a function R is a relation symbol in of arity r

and x x is the atomic formula Rx x then x x is the

1 k q (1) q (r ) 1 k

formula Rx x

(q (1)) (q (r ))

If x x is of the form x x x x then x x is the

1 k 1 k 1 k 1 k

V

x x formula x x x x If x x is of the form

1 k 1 k 1 k 1 k

V

x x The case of disjunction then then x x is the formula

1 k 1 k

is handled in a similar manner

Finally assume that x x is a formula of the form x x x for some

1 k j 1 k

j k There are two cases to consider Supp ose rst that there is no j such that

j k j j and j j Then the desired formula x x is the

1 k

formula x x x Supp ose on the other hand that there is some j such

(j ) 1 k

that j k j j and j j Then there is some j k such that j is not in

the range of Let f k g f k g b e the function such that i i

if i j and j j By applying the induction hypothesis to the function

k

0

of FO such that for all and to the formula x x we obtain a formula

1 k

structures A and all sequences of elements a a from the universe of A

1 k

0 0 0

x a x a A j x a x a A j

1 1 k k 1 k

(1) (k )

0 00

x x Then the desired formula x x is the formula x

1 k 1 k j

k

k

We are now ready to show that LFP L for every k which will imply that

k

k Datalog L

Theorem Let k n n be positive integers such that n k for every i l let

1 l i

S S be relation symbols not in the vocabulary and having arities n n and let

1 l 1 l

x x S S x x S S

1 1 n 1 l l 1 n 1 l

1

l

k

be a system of positive FO formulas over the vocabulary fS S g Then the fol lowing

1 l

are true for the above system and for the operator associated with it

m m m m

For every m each component i l of the stage is

i 1 l

k

denable by an FO formula on al l structures nite or innite

Each component i l of the least xedpoint of the system is

i 1 l

k

denable by an L formula on al l structures nite or innite

Pro of Assume rst that n k for all i l which means that each S is a k ary relation

i i

k

symbol not in and each x x S S i k is a formula of FO over

i 1 k 1 l

the vocabulary fS S g By induction on m simultaneously for all i l we will

1 l

m m m

show that each comp onent of every stage is denable by a formula x x of

1 k

i i

k

1 1

FO The claim is obvious for m since each comp onent of the stage is denable

i

k

by the FO formula x x S S i l Assume now that there are

i 1 k 1 l

k

m m m

FO formulas x x that dene the comp onents i m of the stage

1 k

i i

which means that for every structure A and every sequence a a of elements from the

1 k

universe of A

m m

x a x a i l j a a

1 1 k k 1 k

i i

m+1

m+1

Let us consider the comp onents of the stage which are dened by

i

m m m+1

S A j x a x a S a a

l i 1 1 k k 1 1 k

l 1 i

Every o ccurrence of each relation symbol S j l in the formulas of the system is in

j

a subformula of the form S x x for some function f k g f k g

j

(1) (k )

Since each relation symbol S S has only p ositive o ccurrences in the formulas of the

1 l

system by using the induction hypothesis and rep eatedly applying Lemma for each

k

m

j l and each such function we obtain a formula x x of FO such that

1 k

j

m

a a A j x a x a

j 1 1 k k

(1) (k )

j

m+1

x x b e the formula obtained from x x S S by sub For i l let

1 k i 1 k 1 l

i

m

stituting each subformula S x x by the corresp onding formula x x

j 1 k

(1) (k )

j

m m

x x x x instead of the formula Note that we are using the formulas

1 k 1 k

j j

so that these substitutions can b e carried out without renaming variables or introducing

m+1 k

new variables Thus for every i m the formula x x is an FO formula that

1 k

i

m m+1

denes the comp onent of the stage

i

Consider next the case that n k for at least one i l For every i l let T b e a k ary

i i

relation symbol not in the vocabulary and let x x x x T T b e the

i 1 n n +1 k 1 l

i i

k

FO formula over the vocabulary fT T g obtained from x x S S

1 l i 1 n 1 l

i

as follows if n k then we replace each subformula S x x by the formula

j j

(1) (n )

j

x x T x x x x

n +1 k j n +1 k

(1) (n )

j j

j

while if n k then we replace S x x by T x x A straightfor

j j j

(1) (n ) (1) (n )

j j

m m

ward induction on m simultaneously for all i m shows that if n k then

i

i i

while if n k then for every structure A and every sequence a a of elements from

i 1 k

the universe of the structure

m m

a a A j a a a a a a

1 n n +1 k 1 n n +1 k

i i i i i i

k

m

For every m and i l let x x b e the FO formula that denes the comp o

1 k

i

m m m

nent of the stage If n k then we let x x b e the formula

i 1 n

i i i

m

x x x x x x

n +1 k 1 n n +1 k

i i i i

m m m

while if n k then we let x x b e the formula x x Thus x x

i 1 k 1 k 1 n

i i i i

k

m m

is an FO formula that denes the comp onent of the stage i l

i

x x of the least xedp oin of the system Finally each comp onent

1 n 1 l

i i

W

m k

x x is denable on all structures by the L formula

1 n

m=1 i i

As explained earlier Theorem follows immediately from the preceding Theorem

k

k

and consequently Corollary Let k be a positive integer Then LFP L

k

k Datalog L

It should b e p ointed out that on the class of all nite structures k Datalog is prop erly

k

contained in L since the latter can express noncomputable queries

Next we describ e certain combinatorial games that will play an imp ortant role in the

sequel Let A and B b e two relational structures over a common relational vocabulary

The existential k pebble game on A and B is played b etween two players the Spoiler and

the Duplicator The Sp oiler places k p ebbles one at a time on elements of A after each

move of the Sp oiler the Duplicator resp onds by placing a p ebble on an element of B Once

all p ebbles have b een placed the Sp oiler wins if one of the following two conditions holds

for the elements a and b i k of A and B that have b een p ebbled in the ith move

i i

of the Sp oiler and the Duplicator

the corresp ondence a b i k is not a mapping that is to say there exists i

i i 1

b and b a and i such that i i a

i i i 2 1 2 i

2 1 2 1

the corresp ondence a b i k is a mapping but it is not a a homomorphism

i i

from the substructure of A with universe fa a g to the substructure of B with

1 k

universe fb b g

1 k

If neither of the ab ove two conditions holds then the Sp oiler removes one or more p ebbles

and the game resumes We say that the Duplicator wins the existential k pebble game on

A and B if he has a strategy that allows him to continue playing forever that is the

Sp oiler can never win a round of the game A more formal denition of this concept can b e

given using families of partial homomorphisms with the forth property up to k see KV

for details

If a a is a k tuple of elements of A and b b is a k tuple of elements of

1 k 1 k

B then we say that the Duplicator wins the existential k pebble game on A a a and

1 k

B b b if a a and b b is a winning conguration for the Duplicator

1 k 1 k 1 k

that is the Duplicator can win the game if the ith p ebble of the Sp oiler has b een placed

on a and the ith p ebble of the Duplicator has b een placed on b i k The following

i i

k

result from KV shows that expressibility in L can b e characterized in terms of the

existential k p ebble games

Theorem Let k be a positive integer and Q a k ary query on a class C of nite

structures Then the fol lowing two statements are equivalent

k

Q is expressible in L on C

If A B are two structures in C and a a b b are two k tuples of elements

1 k 1 k

of A and B such that A j Qa a and the Duplicator wins the existential k pebble

1 k

game on A a a and B b b then B j Qb b

1 k 1 k 1 k

Corollary Let k be a positive integer and Q a Boolean query on a class C of nite

structures Then the fol lowing two statements are equivalent

k

on C Q is expressible in L

If A and B are two structures in C such that A j Q and the Duplicator wins the

existential k pebble game on A and B then B j Q

Let and b e two disjoint copies of the vocabulary that is for each relation symbol

1 2

R of and for i the vocabulary contains a relation symbol R of the same arity as

i i

R We write for the vocabulary fD D g where D and D are two new

1 2 1 2 1 2 1 2

unary relation symbols Using the vocabulary we can enco de a pair A B of two

1 2

structures A and B by a single structure A B dened as follows the universe

1 2

of A B is the union of the universes of A and B the interpretation of D resp ectively

1

D is the universe of A resp ectively B and the interpretation of each relation symbol R

2 1

resp ectively R is the interpretation of the relation symbol R on A resp ectively on B

2

This enco ding makes it p ossible to formally view queries on pairs of structures as queries

on single structures

1 2

Our next result concerns the computational and descriptive complexity of existential

k p ebble games

Theorem Let be a relational vocabulary and let k be a positive integer

There is a sentence of least xedpoint logic LFP over the vocabulary that

1 2

expresses the query Given two structures A and B does the Spoiler win the exis

tential k pebble on A and B

As a result there is a polynomialtime algorithm such that given two nite structures

A and B it determines whether the Spoiler wins the existential k pebble game on A

and B

For every nite structure B there is a k Datalog program that expresses the

B

query Given a structure A does the Spoiler win the existential k pebble game on A

and B

Pro of Let x x y y b e a quantierfree formula over the vocabulary

1 k 1 k 1 2

asserting that the corresp ondence x y i k is not a mapping or it is a mapping

i i

that is not a homomorphism from the substructure induced by x x over the vocabulary

1 k

to the substructure induced by y y over the vocabulary Sp ecically is the

1 1 k 2

disjunction of the following formulas

x x y y for every i j k such that i j

i j i j

R x x R x x for every mary relation symbol R in and every

1 i i 2 i i

m m

1 1

mary tuple of variables from the set fx x g

1 k

Let T b e a kary relation symbol not in and let x x y y T b e the

1 2 1 k 1 k

following p ositive rstorder formula over the vocabulary fT g

1 2

k

x y D x D y T x x y y x x y y

j j 1 j 2 j 1 k 1 k 1 k 1 k

j =1

It is easy to verify that if A B are structures and a a b b are k tuples of

1 k 1 k

elements of A and B resp ectively then the following statements are equivalent

A B j a a b b

1 k 1 k

The Sp oiler wins the existential k p ebble on A a a and B b b

1 k 1 k

Let b e the sentence x x y y x x y y of least xed

1 k 1 k 1 k 1 k

p oint logic LFP Consequently for every structure A and every structure B the following

statements are equivalent

A B j

The Sp oiler wins the existential k p ebble game on A and B

Since every LFPexpressible query is computable in p olynomial time it follows that there is

a p olynomial time algorithm that given two nite structures A and B tells whether the

Sp oiler wins the existential k p ebble game on A and B

Note that the p ositive rstorder formula ab ove involves existential quantiers that

are interpreted over the elements of A and universal quantiers that are interpreted over

the elements of B Consequently if B is a xed nite structure then the universal quan

tiers can b e replaced by nitary conjunctions over the elements of B and thus can b e

transformed to a k Datalog program that expresses the query Given a nite structure

p

A do es the Sp oiler win the existential k p ebble game on A and B In what follows we

describ e this Datalog program in some detail The goal of is a ary predicate S Let

B

b b b b e a k tuple of elements of B For each such k tuple we introduce a k ary

1 k

relation symbol T and the following rules

b

For every i and j such that b b we have a rule

i j

T x x

b

1 k

where x x x and x x for s i j

i s

i j s

For every mary relation symbol R of and every mary tuple i i such that

1 m

B b b j Rx x

i i i i

m m

1 1

we have a rule

T x x Rx x

b 1 k i i

m

1

For every j with j k we have a rule

T x x T x x y x x

1 k 1 j 1 j +1 k

b[j c]

cB

where bj c b b c b b and y is a new variable note however that

1 j 1 j +1 k

the b o dy of the rule has k variables

For the goal predicate S we have the rule

T x x S

b 1 k

k

bB

We now have all the necessary notation and machinery to establish the main results of

this section

Theorem Let k be a positive integer B a nite relational structure and A a class of

nite relational structures such that B A Then the fol lowing statements are equivalent

CSP A B is expressible in k Datalog on A

k

CSP A B is expressible in L on A

CSP A B fA A The Spoiler wins the existential k pebble game on A and B g

Pro of The implication follows from Theorem To show that

k

assume that is an L sentence that denes CSP A B on A Let A b e a nite

relational structure in A If A CSP A B then A j and hence the Sp oiler wins

the existential k p ebble game on A and B Indeed if the Duplicator wins this game then

by Theorem B j which means that there is no homomorphism from B to B a

contradiction Conversely if the Sp oiler wins the existential k p ebble game on A and B

then A CSP A B Indeed otherwise there is a homomorphism h A B which

will give the Duplicator a winning strategy for the existential k p ebble game on A and

B whenever the Sp oiler places a p ebble on an element a of A the Duplicator resp onds

by placing the corresp onding p ebble on the element ha of B Finally the implication

follows from Theorem

By combining Theorems and we obtain the following uniform tractability result

for classes of constraint satisfaction problems expressible in Datalog

Theorem Let k be a positive integer A a class of nite relational structures and

B fB A CSP A B is expressible in k Datalog g

Then the uniform constraint satisfaction problem CSP A B is solvable in polynomial time

2k

Moreover the running time of the algorithm is O n where n is the maximum of the sizes

of the input structures A and B

We note that it is an op en problem whether the class

fB CSP A B is expressible in k Datalog g

is recursive In contrast Schaefers class SC which was the basis for the tractability result

of Theorem is recursive p er Theorem

Remark Some remarks concerning the results of this section are in order now

Feder and Vardi FV showed that for every nonuniform CSPB problem there is

a certain k Datalog program such that if the complement of CSPB is expressible

B

in k Datalog then expresses it The preceding Theorems and give an

B

alternative pro of of this result moreover they reveal that as we can take the k

B

Datalog program that expresses the query Given A do es the Sp oiler win the

B

existential k p ebble game on A and B

To illustrate an application of Theorem consider a k ary Horn Bo olean structure

B Then it is easy to verify that CSP B is expressible in k Datalog Consequently

Theorem yields a p olynomialtime algorithm for CSPF B where F is the class

of all nite structures and B is the class of k ary Horn Bo olean structures

Bounded Treewidth and Constraint Satisfaction

Up to this p oint we found tractable cases of the uniform constraint satisfaction problem

CSP A B by imp osing restrictions on the class B In this section we exhibit tractable cases

of CSP A B that are obtained by imp osing restrictions on the class A For this we consider

the concept of treewidth of a relational structure this concept was introduced by Feder and

Vardi FV and generalizes the concept of treewidth of a graph see vL Bo d

A tree decomposition of a nite relational structure A is a lab eled tree T such that the

following conditions hold

every no de of T is lab eled by a nonempty subset of the universe V of A

for every relation R of A and every tuple a a in R there is a no de of T whose

1 n

lab el contains fa a g

1 n

for every a V the set of no des X of T whose lab els include a forms a subtree of T

The width of a tree decomp osition T is the maximum cardinality of a lab el of a no de in T

minus Finally we say that a structure A is of treewidth k if k is the smallest p ositive

integer such that A has a tree decomp osition of width k

An alternative way to dene the treewidth of a structure A is in terms of the treewidth of

the Gaifman graph of A Gai that is the graph that has the elements of the universe of A

as no des and is such that there is an edge b etween two no des if and only if the corresp onding

elements app ear in a tuple in one of the relations of A We call the treewidth of the Gaifman

graph of A the Gaifman treewidth of A We now show that the two concepts coincide

Lemma T is a tree decomposition of a structure A i it is also a tree decomposition

tree of the Gaifman graph of A

Pro of It is easy to see that if T is a tree decomp osition of A then T is also a tree

decomp osition of the Gaifman graph of A Supp ose now that T is a tree decomp osition of

the Gaifman graph of A Consider a tuple a a in a relation R of A The elements

1 n

fa a g form a clique in the Gaifman graph of A By Lemma of DF there is a

1 k

no de x of T such that fa a g is contained in the lab el of x It follows that T is also a

1 k

tree decomp osition of A

For every k let Ak b e the class of all nite relational structures of treewidth

k Bo dlaender Bo d showed that for every k there is a p olynomialtime algorithm

that tests whether a given graph is of treewidth k It follows that for every k there

is a p olynomialtime algorithm that tests whether a given nite relational structure is of

treewidth k in other words each class Ak is recognizable in p olynomial time

Feder and Vardi FV showed for every nite relational structure B there is a p olynomial

time algorithm for the nonuniform constraint satisfaction problem CSP Ak B This also

follows from the fact that the class CSPB is known to b e expressible in existential monadic

secondorder logic FV and it is also known that membership in classes of graphs denable

in monadic secondorder logic is decidable in p olynomial time for graphs of b oundedtree

width DF Here we show that these nonuniform tractability results do uniformize

Let A and B b e two nite relational structure From Theorem and the accompanied

remarks it follows that the existence of a homomorphism h A B is equivalent to

A A

whether Q B is true where Q is the Bo olean conjunctive query whose b o dy consist of

the conjunction of all facts in A We show that if A is a nite relational structure of treewidth

k +1 k +1

A

k then the conjunctive query Q is expressible in FO moreover an FO formula

A

equivalent to Q can b e found in time p olynomial in the size of A

k +1

A

Lemma Let A be a structure of treewidth k then Q is expressible in FO

Pro of The key idea underlying the pro of is that structures of b ounded treewidth have

parse trees DF Chapter which can b e constructed in p olynomial time from tree

decomp ositions Such parse trees are constructed from k boundaried structures which are

structures with k distinguished no des lab eled k Such structured can b e combined

to form larger structures For example two k b oundaried structures A and B we assume

a xed underlying vocabulary can b e glued to obtain a structure A B by taking their

disjoint union and then identifying the two no des lab eled i for i k

A more general way of combining k b oundaried graps is dened as follows Let

A f f consist of a k b oundaried structure A with domain D and injective mappings

1 n

f f k g D We view as an nary op erator of cardinality jD j on k b oundaried

i

structures Given k b oundaried structures A A we construct A A in the

1 n 1 n

following manner We take the disjoint union of A A A and identify the j th distin

1 n

guished no de of A with the no de f j of A for i n j k The distinguished

i i

no des of the result are the distinguished no des of A lab els on other no des are erased

Note that the glue op erator is in essence the binary op erator where is the

k k k k

k b oundaried structure with k elements and empty relations and is the identity function

k

on the set f k g

It is shown in DF that there is a p olynomialtime time algorithm that converts a tree

decomp osition of a structure C with at least k elements of treewidth k to a parse tree

ie an expression in terms of a nite set of unary and binary k b oundaried op ertors of

cardinality k or k starting with the constant structure A k b oundaried structure A

k

A

can b e viewed as a k ary conjunctive query Q whose b o dy consists of the conjunction of all

facts in A but where only the nondistinguished variables are existentially quantied ie

the distinguished elements are viewed as free variables We now show by induction that if C

is k b oundaried structure expressed as a parse tree of k b oundaried op erators of cardinality

k +1

C

k or k starting with then Q can b e expressed in FO In fact we prove the

k

k +1

C

stronger claim that Q can b e expressed in FO where the tuples of free variables is an

arbitrary k tuple of distinct variables from fx x g

1 k +1

The claim clearly holds for whose query is the k variable conjunctive query with empty

k

b o dy Consider the expression A A where A f f is a k b oundaried

1 n 1 n

nary op erator of cardinality k or k and supp ose that we have already constructed

k +1

A A A

n

1

FO conjunctive queries for the queries Q Q Q resp ectively We

1 n

can take the domain D of Q to b e fX X g or fX X g Recall that f is an

1 k 1 k +1 i

injective mapping from f k g into D By the induction hypothesis we can assume that

the tuple of free variabes of is X X We can also assume that is of the

i

f (1) f (k )

form

X B QX

j j

1

k

X X is some p ermutation of X X resp X resp X where X

j j 1 k j j j

1 1

k k +1 k

X X X Observe that there is an implicit existential quantier when jD j has

1 k k +1

(A A )

n

1

k elements We can then express Q by

QX X B

j j 1 n

1

k

In pro of note that there is a homomorphism from the structure A A to a structure

1 n

B i there are homomorphisms h A B and h A B for i n such

i i

that if a is a distinguished element of A lab eled j and f j b then we must have that

i i

hb h a In other words to get a homomorphism from A A to B we need

i 1 n

to nd homomorphisms from A A B that meet the compatibility conditions required

1

k +1

by the mappings f f The FO queries give us the required n

1 n 1 n

homomorphisms and the compatibility b etween the dierent homomorphisms is guaranteed

by lab eling distinct elements that must b e mapp ed identically by the same variable

Remark The fragment FO is more general than the fragment of conjunctive queries

as it also allows for negations and disjunctions Consider the fragment FO that allows

+

no negative formulas and no conjunctions This fragment has the same expressive p ower as

k +1

conjunctive queries It can b e shown that the fragment FO can express precisely the

+

A

queries Q where A is a structure of treewidth k Thus the relationship b etween treewidth

and number of variables is tight

We can now derive the main result of this section

Theorem Let k be a positive integer Ak the class of nite relational structures of

treewidth k and F the class of al l nite relational structures Then the uniform constraint

satisfaction problem CSP Ak F is solvable in polynomial time

Pro of We showed in Lemma that if A is a nite relational structure of treewidth k then

k +1

A

an FO formula equivalent to Q can b e found in time p olynomial in the size of A Thus

in this case checking the existence of a homomorphism h A B reduces to the evaluation

k +1 k +1

of an FO query on the structure B As shown in Var FO has p olynomialtime

combined complexity which implies that CSP Ak F is solvable in p olynomial time

A precise complexity analysis of CSPAk F is provided in GLS where it is shown

that the problem is LOGFCLcomplete LOGCFL is the class of decision problems that are

logspacereducible to a contextfree language

We note that another way to dene the treewidth of a structure A is in terms of its

incidence graph CR The incidence graph of A is a bipartite graph that has all the tuples

in relations of A as no des in one part the elements of the universe of A as no des in the other

part and there is an edge from a no de t to a no de a i t is a tuple in A and a is an element

that o ccurs in t We call the treewidth of the incidence graph of A the incidence treewidth

of A Given a tree decomp osition T of A we can convert it into a tree decomp osition T

of the incidence graph of A as follows T has the same graph structure as T For every

relation R of A and every tuple t a a in R there is a no de x of T whose lab el

1 n

contains fa a g We simply add t which is a no de of the incidence graph of A to the

1 n

lab el of x in T It is easy to see that T is a tree decomp osition Thus if A has treewidth k

then it has incidence treewidth at most k In the other direction we can convert a tree

decomp osition T of the incidence graph of A to a tree decomp osition T of A by replacing

a tuple t a a in the lab el of a no de x of T by the set of elements fa a g

1 n 1 n

Thus if the incidence treewidth is k then the treewidth is at most k n where n

is the maximal arity of a relation in A As an example of the gap b etween the two notions

consider structure A with a single tuple a a It is easy to see that its treewidth is

1 n

n since its Gaifman graph is an nclique while its incidence treewidth is since its

incidence graph is a tree

As mentioned in the introduction Chekuri and Rama jaran CR showed that the uni

form constraint satisfaction problem CSPQk F is solvable in p olynomial time where

Qk is the class of structure of querywidth k Chekuri and Rama jaran actually studied the

conjunctivequery containment problem which explains the term querywidth In CR

only vocabularies of b ounded arities were considered but the result was extended in CR

to general vocabularies They also showed that the incidence treewidth of a structure A

provides a strict upp er b ound for its query width by showing that a tree decomp osition

of the incidence graph is also what they called query decomposition Note however that

the prop erty of having treewidth k can b e tested in linear time Bo d while the prop

erty of having querywidth is NPcomplete GLS Thus the p olynomial tractability of

CSP Ak F follows also from the results in CR Gottlob Leone and Scarcello GLS

dene yet another notion of width called hypertree width They showed that the querywidth

of a structure A provides a strict upp er b ound for the hypertree width of A but that the

class Hk of structures of hypertree width at most k as well as the class CSPHk F are

eciently recognizable

As this discussion shows the treewidth of a structure A is at least the arity of its widest

relation more precisely it is at least the number of distinct elements o ccuring in a tuple

of A minus It is desirable therefore to decrease the arity of the relations in A This

can b e done by enco ding the structures A and B by binary structures ie structures with

binary relations only We refer to the binary enco ding of a structure A by binar y A The

vocabulary of binar y A contains a binary relation symbol E for each pairs of not

P Qij

necessarily distinct relations symbols P Q of A and each pair of argument p ositions i j of

P Q resp ectively The domain of binar y A is the set of tuples o ccurring in the relations of

A The relation E contains the pair s t if the ith element of s and the j th element of

P Qij

t are identical Note that in binar y A we have that the relation E contains all tuple in

P P ii

P that if s t is in E then t s the relation Qq and that if s t in the relation

P Qij QP ji

E and t u is in the relation E then s u is in the relation E We refer to

P Qij QRjk P Rik

this as the reexivesymmetrictransitive closure of the E relations

Lemma There is a homomorphism from A to B i there is a homomorphism from

binar y A to binar y B

Pro of Assume that there a homomorphism h from A to B For a tuple t dene ht

comp onentwise It is easy to see that h is a homomorphism from binar y A to binar y B

Supp ose that E contain the pair s t in binar y A then the elements s and t are

P Qij i j

identical which implies that hs ht It follows that the pair hs ht is in the

i j

relation E in binar y B

P Qij

Conversely supp ose that h is a homomorphism from binar y A to binar y B Let a b e

the ith element of a tuple t in a relation P of A and let b b e the ith element of ht We

dene ha to b e b It is easy to see that h is a homomorphism from A to B provided it

is well dened Supp ose a is also the j th element of a tuple u in a relation Q in A Then

t u is in the relation E in binar y A But then we must have that ht hu is in

P Qij

the relation E in binar y B which implies that b is also the j th element of the tuple

P Qij

hu

It is worth noting that in binar y A it is not necessary to enco de all coincidence relations

in A It suces to put enough tuples there so that the reexivesymmetrictransitive closure

enco des all such coincidence relations For example that if s t in the relation E and

P Qij

t u is in the relation E then it is not necessary to store s u is in the relation E

QRjk P Rik

It is not dicult to prove that Lemma still holds The reason for this optimization is

that to minimize the treewidth of binar y A it is desirable to minimize the number of tuples

of binar y A It is p ossible to show that under such an optimized enco ding acyclic join

queries AHV Ull can b e enco ded by structures of b ounded treewidth

A class that is more general than the class of b ounded treewidth graphs is the class of

b ounded cliquewidth graphs It is shown in CO than if a graph has treewidth k then its

k +1

cliquewidth is b ounded from ab ove by Thus a class of graphs that has b ounded

treewidth also has b ounded cliquewidth Courcelle Makowsky and Rotics CMR showed

that if a class C is eectively of b ounded cliquewidth then every monadic secondorder

prop erty on C is p olynomial It follows that CSP A B is in PTIME for each B if A is of

b ounded cliquewidth On the other hand every clique has cliquewidth CO and we

have observed ab ove that the the class CSP K G is NPcomplete Thus while the tractablty

result of constraint satisfaction for b ounded treewidth structures do es uniformize this is not

the case for b ounded cliquewidth structures

Concluding Remarks

In this pap er we brought into center stage the close connection b etween conjunctivequery

containment and constraint satisfaction Moreover we showed that several tractable cases

of nonuniform CSP B problems uniformize and thus yield tractable cases of uniform

constraint satisfaction and conjunctive query containment

During the past several years a group of researchers has pursued tractable cases of

constraint satisfaction CSPA B by investigating the class of functions under which the

relations in the structures in B are closed JC JCG JCG Jea see PJ for a

survey In FV FV a preliminary investigation has b een carried out on the connection

b etween expressibility of CSP B problems in Datalog and closure of the relations in B under

certain functions In a forthcoming pap er we will elab orate further on this connection and

delineate the relationship b etween the two approaches

Acknowledgement Wed like to thank Georg Gottlob Leonid Libkin and Werner Nutt

for useful discussions In particular Georg Gottlob help ed us clarify the relationship b etween

treewidth and incidence treewidth in Section

References

AHV S Abiteb oul R Hull and V Vianu Foundations of databases AddisonWesley

BB C Beeri and PA Bernstein Computational problems related to the design of

normal form relational schemas ACM Trans on Database Systems

Bib W Bib el Constraint satisfaction from a deductive viewp oint Articial Intel li

gence

Bo d HL Bo dlaender A lineartime algorithm for nding treedecomp ositions of small

treewidth In Proc th ACM Symp on Theory of Computing pages

CH A Chandra and D Harel queries and generalizations Journal of

Logic Programming

CM AK Chandra and PM Merlin Optimal implementation of conjunctive queries

in relational databases In Proc th ACM Symp on Theory of Computing pages

CMR B Courcelle JA Makowsky and U Rotics Linear time solvable optimization

problems on certain structured graph families Technical rep ort Technion

CO B Courcelle and S Olariu Upp er b ounds to the cliquewidth of graphs Technical

rep ort University of Bordeaux

CR C Chekuri and A Ra jaraman Conjunctive query containment revisited In

PhG Kolaitis and F Afrati editors Database Theory ICDT volume

of Lecture Notes in Computer Science pages SpringerVerlag

CR C Chekuri and A Rama jaran Conjunctive query containment revisited Tech

nical rep ort Stanford University November

Dec R Dechter Decomp osing a relation into a tree of binary relations Journal of

Computer and System Sciences

Dec R Dechter Constraint networks In SC Shapiro editor Encyclopedia of Arti

cial Intel ligence pages Wiley New York

DF RG Downey and MR Fellows Parametrized Complexity SpringerVerlag

DG WF Dowling and JH Gallier Lineartime algorithms for testing the satisability

of prop ositional horn formulae Journal of Logic Programming

DP R Dechter and J Pearl Structure identication in relational data Articial

Intel ligence

FV TA Feder and MY Vardi Monotone monadic SNP and constraint satisfaction

In Proc th ACM Symp on Theory of Computing pages

FV TA Feder and MY Vardi The computational structure of monotone monadic

SNP and constraint satisfaction a study through Datalog and group theory

SIAM J on Computing

Gai H Gaifman On lo cal and nonlo cal prop erties In J Stern editor Logic Col lo

quium pages North Holland

GJ M R Garey and D S Johnson Computers and Intractability A Guide to the

Theory of NPCompleteness W H Freeman and Co

GJC M Gyssens PG Jeavons and DA Cohen Decomp osition constraint satisfaction

problems using database techniques Articial Intel ligence

GLS G Gottlob N Leone and F Scarcello The complexity of acyclic conjunctive

queries In Proc th IEEE Symp on Foundation of Computer Science pages

GLS G Gottlob N Leone and F Scarcello Hyp ertree decomp ositions and tractable

queries In Proc th ACM Symp on Principles of Database Systems pages

HN P Hell and J NesetrilOn the complexity of H coloring Journal of Combinatorial

Theory Series B

JC P Jeavons and D Cohen An algebraic characterization of tractable constraints

In DZ Du and M Li editors Proceedings of First Annual International Confer

ence on Computing and Combinatorics COCOON pages Springer

Verlag

JCG P Jeavons D Cohen and M Gyssens A unifying framework for tractable con

straints In U Montanari and F Rossi editors Proceedings of st International

Conference on Principles and Practice of Constraint Programming CP pages

SpringerVerlag

JCG P Jeavons D Cohen and M Gyssens A test for tractability In EC Freuder

editor Proceedings of nd International Conference on Principles and Practice of

Constraint Programming CP pages SpringerVerlag

Jea P Jeavons On the algebraic structure of combinatorial problems Theoretical

Computer Science To app ear

Kum V Kumar Algorithms for constraintsatisfaction problems AI Magazine

KV Ph G Kolaitis and M Y Vardi Innitary logic and laws Information and

Computation Sp ecial issue Selections from the Fifth Annual

IEEE Symp osium on Logic in Computer Science

KV Ph G Kolaitis and M Y Vardi On the expressive p ower of Datalog to ols and

a case study Journal of Computer and System Sciences August

Sp ecial Issue Selections from Ninth Annual ACM SIGACTSIGMOD

SIGART Symp osium on Principles of Database Systems PODS Nashville TN

USA April

KV PhG Kolaitis and MY Vardi On the expressive p ower of variableconned

logics In Proceedings of th Annual IEEE Symposium on Logic in Computer

Science LICS pages

KW RW Kaye and R Wilson Linear Algebra Oxford Univ Pr

LMSS AY Levy AO Mendelzon Y Sagiv and D Srivastava Answering queries

using views In Proc th ACM Symp on Principles of Database Systems pages

LP HR Lewis and C Papadimitriou Elements of the Theory of Computation Pren

tice Hall

McK J McKinsey The decision problem for some classes of sentence without quanti

ers Journal of Symbolic Logic

Mes P Meseguer Constraint satisfaction problems an overview AICOM

MF AK Mackworth and EC Freuder The complexity of constraint satisfaction

revisited Articial Intel ligen ce

Mos Y N Moschovakis Elementary Induction on Abstract Structures North Holland

Pap C H Papadimitriou Computational Complexity AddisonWesley Publishing

Company

PJ J Pearson and P Jeavons A survey of tractable constraint satisfaction problems

Technical Rep ort CSDTR Royal Holloway University of London

Qia X Qian Query folding In Proc th Intl Conf on Data Engineering pages

Riv I Rival Maximal sublattices of nite distributive lattices Proc AMS

RSU A Ra jamaran Y Sagiv and JD Ullman Answering queries using templates

with binding patterns In Proc th ACM Symp on Principles of Database

Systems pages

Sar Y Saraiya Subtree elimination algorithms in deductive databases PhD thesis

Department of Computer Science Stanford University

Sch TJ Schaefer The complexity of satisability problems In Proc th ACM

Symp on Theory of Computing pages

Tsa EPK Tsang Foundations of Constraint Satisfaction Academic Press

Ull J D Ullman Database and Know ledgeBase Systems Volumes I and II Com

puter Science Press

Ull JD Ullman Information integration using logical views In PhG Kolaitis and

F Afrati editors Database Theory ICDT volume of Lecture Notes in

Computer Science pages SpringerVerlag

Var MY Vardi On the complexity of b oundedvariable queries In Proc th ACM

Symp on Principles of Database Systems pages

vL J van Leeuwen Graph algorithms In J van Leeuwen editor Handbook of

Theoretical Computer Science A Algorithms and Complexity chapter pages

Elsevier Amsterdam

Yan M Yannakakis No dedeletion problems on bipartite graphs SIAM Journal of

Computing