On the Equivalence of Recursive and

Nonrecursive Programs

y

Sura jit Chaudhuri Moshe Y Vardi

Database Tech Department Department of Computer Science

Hewlett Packard Lab oratories Rice University

Page Mill Road Houston TX

Palo Alto CA Email vardicsriceedu

Email chaudhurihplhpcom URL httpwwwcsriceedu vardi

Abstract We study the problem of determining whether a given recursive Datalog

program is equivalent to a given nonrecursive Datalog program Since nonrecursive Dat

alog programs are equivalent to unions of conjunctive queries we study also the problem

of determining whether a given recursive Datalog program is contained in a union of con

junctive queries For this problem we prove doubly exp onential upp er and lower time

b ounds For the equivalence problem we prove triply exp onential upp er and lower time

b ounds

Intro duction

It has b een recognized for some time that rstorder database query languages are lacking

in expressive p ower AU GM Zl Since then many higherorder query languages

have b een investigated AV CH Ch CH Im Va A query language that

has received considerable attention recently is Datalog the language of logic programs

known also as Hornclause programs without function symb ols K Ul which is

essentially a fragment of xp oint logic CH Mo See Ul for a detailed discussion

of Datalog

The gain in expressive p ower do es not however come for free evaluating Data

log programs is harder than evaluating rstorder queries Va Recent works have

addressed the problems of nding ecient evaluation metho ds for Datalog programs

BR is a go o d survey on this topic and developing optimization techniques for Dat

alog see MP Nab NRSU The techniques to optimize evaluation of queries

Work done while this author was at Stanford University and was supp orted by ARO grant DAAL

G NSF grant IRI Air Force grant AFOSR and a grant of IBM Corp

y

Work done while this author was at the IBM Almaden Research Center

are often based on the ability to transform a query into an equivalent one that can b e

evaluated more eciently RSUV Therefore determining equivalence of queries is one

of the most fundamental optimization problems Naturally the problem of determining

equivalence of Datalog programs has received attention Unfortunately Datalog program

equivalence is undecidable Shm

Since the source of the diculty in evaluating Datalog programs is their recursive

nature the rst line of attack in trying to optimize such programs is to eliminate the

recursion The following example is from Naa

Example Consider the following Datalog program

1

buys X Y likes X Y

buys X Y trendy X buys Z Y

It can b e shown that is equivalent to the following nonrecursive program

1

buys X Y likes X Y

buys X Y trendy X likes Z Y

Consider on the other hand the following Datalog program

2

buys X Y likes X Y

buys X Y knows X Z buys Z Y

It can b e shown that is not equivalent to the following nonrecursive program

2

buys X Y likes X Y

buys X Y knows X Z likes Z Y

In fact is inherently recursive ie it is not equivalent to any nonrecursive program

2

Thus a problem of sp ecial interest is that of determining the equivalence of a given

recursive Datalog program to a given nonrecursive program ie a Datalog program

where the dep endency graph among the predicates is acyclic

This problem is the main fo cus of this pap er Note that this problem is dierent from

that of determining whether a given recursive Datalog program is equivalent to some

nonrecursive program The latter problem called the boundedness problem is known

to b e undecidable GMSV and has b een studied extensively see KA for a survey

and HKMV HKMV for recent results

A nonrecursive program can b e rewritten as a union of conjunctive queries Thus

containment of a nonrecursive program in a recursive program can b e reduced to the

containment of a conjunctive query in a recursive program The latter problem was

shown to b e decidable in fact it is EXPTIMEcomplete CK CLM Sab Thus

what was left op en is the other direction ie the problem of determining whether a

recursive program is contained in a nonrecursive program We attack this problem by

investigating the containment of recursive programs in unions of conjunctive queries Our

main result is that containment of recursive programs in unions of conjunctive queries

is decidable Therefore iIt follows that equivalence of two given programs is decidable

when one is recursive and the other nonrecursive

We rst prove that the decidability of the containment problem follows from a p ower

ful general decidability result due to Courcelle Cou Unfortunately while Courcelles

result yields the decidability of the containment problem it provides only nonelementary

timeb ounds Cou The main b o dy of the pap er is dedicated to a detailed study of

the computational complexity of containment and equivalence

For upp er b ounds we use the automatontheoretic approach advo cated in Va The

key idea is that a recursive program can b e viewed as an innite union of conjunctive

queries These conjunctive queries can b e represented by pro of trees and the set of

pro of trees corresp onding to a given recursive program can b e represented by a tree

automaton This representation enables us to reduce containment of recursive programs

in unions of conjunctive queries to containment of tree automata which is known to

b e decidable in exp onential time Se The size of the tree automata obtained in

the reduction is exp onential in the size of the input as a result we obtain a doubly

exp onential time upp er b ound for containment in unions of conjunctive queries These

b ounds turn out to b e optimal by a succinct enco ding of alternating exp onentialspace

Turing machines we prove a matching doublyexp onential time lower b ound A case

of sp ecial interest is that of linear programs ie programs in which each rule contains

at most one recursive subgoal CK UV In this case the corresp onding set of

pro of trees can b e represented by word automata for which containment is known to

b e decidable in p olynomial space MS As a result we obtain an exp onential space

upp er b ound for the containment problem for linear programs which is also matched by

1

a lower b ound

We then note that expressing a nonrecursive program as a union of conjunctive queries

may involve an exp onential blowup in size Thus our upp erb ound technique for con

tainment yields a triplyexp onential time upp er b ound for containment in nonrecursive

programs doublyexp onential space upp er b ound for linear programs We show that

the succinctness of nonrecursive programs is inherent by proving a matching triply

exp onential time lower b ound doublyexp onential space lower b ound for linear pro

grams Finally we observe that these results also yield the same complexity b ounds for

equivalence to nonrecursive programs Thus while equivalence to nonrecursive programs

is decidable it is highly intractable We note that one has to b e careful in interpret

ing lower b ounds for query containment While containment of conjunctive queries in

recursive program is complete for EXPTIME CK CLM Sab this complexity

1

Our upp er b ounds follow also from van der Meydens results on recursively indenite databases

Mey Theorem

is simply the expression complexity of evaluation Datalog programs Va In fact if

attention is restricted to programs of b ounded arity we get NPcompleteness instead of

EXPTIMEcompleteness In contrast our lower b ounds here imply real intractability

and they hold even for programs of b ounded arity

Preliminaries

Conjunctive Queries and Datalog

A conjunctive query is a p ositive existential conjunctive rstorder formula ie the only

prop ositional connective allowed is and the only quantier allowed is Without loss

of generality we can assume that conjunctive queries are given as formulas x x

1 k

of the form y y a a with free variables among x x where the

1 m 1 n 1 k

a s are atomic formulas of the form pz z over the variables x x y y

i 1 1 k 1 m

For example the conjunctive query y E x y E y z is satised by all pairs hx z i

such that there is a path of length b etween x and z The free variables are also called

distinguished variables We distinguish b etween variables and occurrences of variables in a

conjunctive query but we only consider o ccurrences of variables in the atomic formulas of

the query For example the variables x and y have each two o ccurrences in y E x y

E y z An o ccurrence of a distinguished variable in a conjunctive query is called a

distinguished occurrence A union of conjunctive queries is a disjunction

s

x x

i 1 k

i=1

of conjunctive queries

A union of conjunctive queries x x can b e applied to a database D The

1 k

result

D fa a jD j a a g

1 k 1 k

is the set of k ary tuples that satisfy in D If has no distinguished variables then

it is viewed as a Bo olean query the result is either the empty corresp onding to

false or the relation containing the ary tuple corresp onding to true

A Datalog program consists of a set of Horn rules A Horn rule consists of a single

atom in the head of the rule and a conjunction of atoms in the b o dy where an atom is a

formula of the form pz z where p is a predicate symb ol and z z are variables The

1 l 1 l

predicates that o ccur in head of rules are called intensional IDB predicates The rest

of the predicates are called extensional EDB predicates Let b e a Datalog program

i

Let Q D b e the collection of facts ab out an IDB predicate Q that can b e deduced from



1

a database D by at most i applications of the rules in and let Q D b e the collection



of facts ab out Q that can b e deduced from D by any numb er of applications of the rules

in that is

1 i

Q D Q D

 

i0

We say that the program with goal predicate Q is contained in a union of conjunctive

1

queries if Q D D for each database D It is known cf MUV Naa



1

that the relation dened by an IDB predicate in a Datalog program ie Q D can



b e dened by an innite union of conjunctive queries That is for each IDB predicate Q

there is an innite sequence of conjunctive queries such that for every database

0 1

S

1

1

D we have Q D D The s are called the expansions of Q

i i

 i=0

A predicate P depends on a predicate Q in a program if Q o ccurs in the b o dy

of a rule r of and P is the predicate at the head of r The dependence graph of

is a directed graph whose no des are the predicates of and whose edges capture the

dep endence relation ie there is an edge from Q to P if P dep ends on P A program

is nonrecursive if its dep endence graph is acyclic ie no predicate dep ends recursively

on itself It is wellknown that a nonrecursive program has only nitely many expansions

up to renaming of variables Thus a nonrecursive program is equivalent to a union of

conjunctive queries

Containment of Conjunctive Queries

Let x x and x x are two conjunctive queries with the same vector of

1 k 1 k

distinguished variables We say that is contained in if D D for each database

D ie if the following implication is valid

x x x x x x

1 k 1 k 1 k

Denition A containment mapping from a conjunctive query to a conjunctive

query is a renaming of variables sub ject to the following constraints a every distin

guished variable must map to itself and b after renaming every literal in must b e

among the literals of

Conjunctivequery containment can b e characterized in terms of containment map

pings cf Ul

Theorem A conjunctive query x x is contained in a conjunctive query

1 k

x x i there is a containment mapping from to

1 k

It will b e convenient to view a containment mapping h from to as a mapping

from o ccurrences of variables in to o ccurrences to variables in Such a mapping has

the prop erty that v and v are o ccurrences of the same variable in then hv and

1 2 1

hv are o ccurrences of the same variable in

2

Sagiv and Yannakakis SY extended Theorem to the case where queries are

unions of conjunctive queries

Theorem If and are union of conjunctive queries then is

i i i i

contained in ie D D for every database D i each is contained in

i

some ie there is a containment mapping from to

j j i

Expansion Trees

Expansions can b e describ ed in terms of expansion trees The no des of an expansion tree

for a Datalog program are lab eled by pairs of the form where is an IDB atom

and is an instance of a rule r of such that the head of is The atom lab eling a

no de x is denoted and the rule lab eling a no de x is denoted In an expansion tree

x x

for an IDB predicate Q the ro ot is lab eled by a Qatom Consider a no de x where

x

is the atom Rt is the rule

x

1 m

Rt R t R t

1 m

i i

1

l

and the IDB atoms in the b o dy of the rule are R t R t Then x has children

i i

1

l

i i

1

l

x xl lab eled with the atoms R t R t In particular if all atoms in are

i i x

1

l

EDB atoms then x must b e a leaf The query corresp onding to an expansion tree is the

conjunction of all EDB atoms in for all no des x in the tree with the variables in the

x

ro ot atom as the free variables Thus we can view an expansion tree as a conjunctive

query Let tr eesQ denote the set of expansion trees for an IDB predicate Q in

Note that tr eesQ is an innite set Then for every database D we have

1

Q D D



2tr ees(Q)

It follows that is contained in a conjunctive query if there is a containment mapping

from to each expansion tree in tr eesQ ie a mapping which maps distinguished

variables to distinguished variables and maps the atoms of to atoms in the b o dies of

rules lab eling no des of

Of particular interest are expansion trees that are obtained by unfolding the pro

gram

Denition An expansion tree of a Datalog program is an unfolding expansion

tree if it satises the following conditions a the atom lab eling the ro ot is the head of

a rule in and b if a no de x is lab eled by then the variables in the b o dy of

x x

either o ccur in or they do not o ccur in the lab el of any no de ab ove x

x x

Intuitively an unfolding expansion tree is obtained by starting with a head of a rule in

as the atom lab eling the ro ot and then creating children by unifying an atom lab eling

a no de with a fresh copy of a rule in Note that if a variable v o ccur in the atom

lab elling a no de x but not in the atoms lab eling the children of x then v will not o ccur

in the lab el of any descendant of x

We denote the collection of unfolding expansion trees for an IDB predicate Q in a

tr eesQ It is easy to see that every expansion tree can b e obtained program by u

by renaming variables in an unfolding expansion tree Thus every expansion tree viewed

as a conjunctive query is contained in an unfolding expansion tree

pXY pXY eXZ pZY pXY pXY eXZ pZY

pXY pXY eXZ pZY

?

?

pZY pZY eZX pXY pZY pZY eZX pXY

pZY pZY eZW pWY

?

?

pXY pXY eXY pXY pXY eXY

pWY pWY eWY

a b

Figure a Expansion Tree b Unfolding Expansion Tree

Example Figure shows expansion trees for the IDB predicate p in the following

transitive closure program

r pX Y eX Z pZ Y

0

r pX Y e X Y

Note that the variable X is reused in the child of the ro ot of the expansion tree while a

new variable W is used instead of X in the child of the ro ot of the unfolding expansion

tree

The following prop osition follows immediately

Prop osition Let be a program with a goal predicate Q For every database D

we have

1

D D Q



tr ees(Q) 2u

Decidability

We can view a conjunctive query x x with free variables among x x as

1 k 1 k

a sorted relational structure A The sorts V and F denote the set of variables and

atomic formulas in resp ectively For each l ary predicate symb ol P in the vo cabulary of

0 l

we have a predicate symb ol P in the vo cabulary of A of typ e F V The vo cabulary

of A also has constant symb ols x x These constant and predicate symb ols are

1 k

interpreted in A as follows First the constant symb ol x is interpreted as x Second

i i

if the atomic formula a is P z z in then we have a tuple ha z z i in the

i 1 l i 1 l

0

interpretation of P Note that can have multiple o ccurrences of the same atomic

formula which explains why we need the sort F in A

Since a conjunctive query can b e viewed as a sorted relational structure A we

can view u tr eesQ as a set of sorted relational structures which we denote as

str Q If Q is k ary then we can assume that all conjunctive queries in u tr eesQ

have free variables among x x Thus all structures in str Q have the same

1 k

vo cabulary denoted v ocabQ We can now express prop erties of Datalog program in

terms of prop erties of the asso ciated collection of sorted structures If is a storder

formula over v ocabQ then we say that the program with goal predicate Q satises

if holds in al l structures in str Q

As an example consider the prop erty of strong nonredundancy We say that a Datalog

program with goal predicate Q is strongly nonredundant if no unfolding expansion

tree contains two distinct o ccurrences of the same EDB atom It is easy to see that this

prop erty can b e expressed as a rstorder prop erty of the structures in str Q For

simplicity assume that there is a single EDB predicate P which happ ens to b e k ary

Then the desired prop erty holds if the program with goal predicate Q satises the

sentence

0

x x F y y V P x y y

1 2 1 k 1 1 k

0

P x y y x x

2 1 k 1 2

Firstorder logic gives us a very p owerful language to describ e prop erties of Datalog

queries in terms of the asso ciated set of structures It is not clear a priori whether such

prop erties can b e eectively tested After all to check whether a Datalog program

with a goal Q satises a rstorder sentence we have to check in principle the innitely

many structures in str Q The following p owerful result by Courcelle asserts that

2

nevertheless rstorder prop erties of Datalog programs can b e eectively tested

Theorem Cou Cou There is an algorithm to decide given a Datalog program

with goal predicate Q and a rstorder sentence over v ocabQ whether satises

The decidability of containment in nonrecursive programs follows now from Theo

rem

Theorem Containment of recursive Datalog programs in nonrecursive Datalog pro

grams is decidable

Pro of Let us assume that is a recursive Datalog program with the goal predicate Q

Let b e an arbitrary nonrecursive program Assume that has already b een rewritten

as a nite union

s

x x

i 1 k

i=1

2

Courcelles result applies also to monadic secondorder logic which is a p owerful extension of rst

order logic

of conjunctive queries Let x x b e y y a a with free vari

i 1 k 1 m 1 n

ables among x x where a is an atomic formula p z z over the variables

1 k i i 1 l

0

x x y y Dene to b e the sentence y y V a a

1 k 1 m 1 m 1 n

i

0 0 0 0 0 0 0 0

F a a where a is the atomic formula p a z z and z z are ob

i

1 n i i 1 l 1 l

tained from z z by substituting x for x

1 l j j

W

s

0 0 0

Assume We claim that is contained in i satises where is

i=1 i

that is contained in Then from Theorem it follows that for every expansion

x x u tr eesQ there exists some y y a a such that

1 k i 1 m 1 n

there is a containment mapping from to Let x x b e z z b b

i 1 k 1 r 1 s

0 0

Let the corresp onding tuples in A b e b b Consider any a where q n Since

q

1 s

there is a containment mapping from to it follows that a maps to some b where the

i q j

distinguished variables are preserved Therefore there is a substitution for the variables

0 0 0

in a such that it corresp onds to the literal b Therefore holds over A Thus

q j i

0

satises

0

Let us now assume that satises Let b e an expansion of that corresp onds

to an unfolding expansion tree and let A b e the corresp onding structure Since

0 0

satises it follows that some must hold over A Therefore there is an assignment

i

0 0 0

Moreover such a corresp onds to a tuple b of of variables such that a literal a

j i q

mapping ensures that the distinguished variables map to themselves Thus is contained

in This completes our pro of

Corollary Equivalence of Datalog programs to nonrecursive programs is decidable

Unfortunately Theorem yields a very high upp er b ound the algorithm describ ed

in Cou is of nonelementary time complexity ie its time complexity cannot b e

b ounded by any nite stack of exp onentials There is however a p ossible way around

this diculty While Courcelles algorithm for arbitrary rstorder prop erties of Datalog

programs has a nonelementary time complexity more ecient algorithms may exist for

sp ecic prop erties The crux of Courcelles result is the well known connection b etween

monadic secondorder logic and tree automata cf TW Ra It is conceivable

that by using automatatheoretic techniques directly we might b e able to obtain more

feasible algorithms for the equivalence problem A similar strategy of using automata

theory directly rather than monadic secondorder logic was demonstrated successfully for

decision problems in the area of program verication cf VW EJ

Automata on Words and Trees

In this section we review some of the relevant results from automata theory on emptiness

and containment of automata We will use these results for proving the upp erb ound on

the complexity of deciding containment of a Datalog predicate in a union of conjunctive

queries The material in this section is quoted from Va

Automata on Words

An automaton A is a tuple S S F where is a nite alphabet S is a nite set

0

of states S S is the set of initial states F S is the set of accepting states and

0

S

S is a transition function Note that the automaton is nondeterministic

since it may have many initial states and the transition function may sp ecify many

p ossible transitions for each state and letter

n n+1

A run r of A over a word w a a is a sequence s s S such

0 n1 0 n

that

s S

0 0

s s a for i n

i+1 i i

The run r is accepting if s F The word w is accepted by A if A has an accepting run

n

over w The language of A denoted LA is the set of words accepted by A

An imp ortant prop erty of automata is their closure under Bo olean op erations

Prop osition RS Let A A be a automata over an alphabet Then there are

1 2



automata A A and A such that LA LA LA LA LA and

3 4 5 3 1 4 1 2

LA LA LA

5 1 2

The constructions for union and intersection involve only a p olynomial blowup in the

size of the automata In contrast complementation may involve an exp onential blowup

in the size of the automaton MF

The nonemptiness problem for automata is to decide given an automaton A whether

LA is nonempty

Prop osition Jo RS The nonemptiness problem for automata is decidable in

nondeterministic logarithmic space

Pro of Let A S S F b e the given automaton Let s t b e states of S Say

0

that s is directly connected to t if there is a letter a such that t s a Say

that s is connected to t if there is a sequence s s m of states such that

1 m

s s s t and s is directly connected to s for i m It is easy to see that

1 n i i+1

LA is nonempty i there are states s S and t F such that s is connected to t

0

Thus automata nonemptiness is equivalent to graph reachability which can b e tested in

nondeterministic logarithmic space

A problem related to nonemptiness is the containment problem which is to decide

given automata A and A whether LA LA Note that LA LA i

1 2 1 2 1 2

LA LA Thus by Prop osition the containment problem is reducible to

1 2

the nonemptiness problem though the reduction may b e computationally exp ensive

Prop osition MS The containment problem for automata is PSPACEcomplete

Automata on Trees

Let N denote the set of p ositive integers The variables x and y denote elements of N

A tree is a nite subset of N such that if xi where x N and i N then also

x and and if i then also xi The elements of are called nodes If x

and xi are no des of then x is the parent of xi and xi is the child of x The no de x is a

leaf if it has no children By denition the empty sequence is a memb er of every tree

it is called the root

A labeled tree for a nite alphab et is a pair where is a tree and

assigns to every no de a lab el Labeled trees are often referred to as trees the intention

will b e clear from the context The set of lab eled trees is denoted tr ees

A tree automaton A is a tuple S S F where is a nite alphab et S is

0

a nite set of states S S is a set of initial states F S is a set of accepting

0

S

states and S is a transition function such that s a is nite for all

s S and a A run r S of A on a lab eled tree is a lab eling of

by states of A such that the ro ot is lab eled by an initial state and the transitions

ob ey the transition function that is r S and if x is not a leaf and x has k

0

children then hr x r xk i r x x If for every leaf x of there is a tuple

hs s i r x x such that fs s g F then r is accepting A accepts

1 l 1 l

if it has an accepting run on The tree language of A denoted T A is the

set of trees accepted by A

An imp ortant prop erty of tree automata is their closure under Bo olean op erations

Prop osition Cos Let A A be a automata over an alphabet Then there are

1 2



automata A A and A such that LA LA LA LA LA and

3 4 5 3 1 4 1 2

LA LA LA

5 1 2

As in word automata the constructions for union and intersection involve only a p oly

nomial blowup in the size of the automata while complementation may involve an ex

p onential blowup in the size of the automaton

The nonemptiness problem for tree automata is to decide given a tree automaton A

whether T A is nonempty

Prop osition Do TW The nonemptiness problem for tree automata is decid

able in polynomial time

Pro of Let A S S F b e the given tree automaton Let acceptA b e the

0

minimal set of states in S such that

F acceptA and

if s is a state such that there are a letter a and a transition hs s i

1 k



s a acceptA then s acceptA

It is easy to see that T A is nonempty i S acceptA Intuitively acceptA

0

is the set of all states that lab el the ro ots of accepting runs Thus T A is nonempty

precisely when some initial state is in acceptA The claim follows since acceptA can

b e computed b ottomup in p olynomial time

We note that using techinques such as in Be the nonemptiness problem for tree

automata is decidable in linear time

A problem related to nonemptiness is the containment problem which is to decide

given tree automata A and A whether T A T A As for word automata the

1 2 1 2

containment problem is reducible to the nonemptiness problem though the reduction

may b e computationally exp ensive

Prop osition Se The containment problem for tree automata is EXPTIME

complete

Containment in Union of Conjunctive Queries

Pro of Trees

The basic idea b ehind proof trees is to describ e expansion trees using a nite numb er

of lab els We b ound the numb er of lab els by b ounding the set of variables that can

o ccur in lab els of no des in the tree If r is a rule of a Datalog program then let

numr b e the numb er of variables o ccurring in IDB atoms in r head or b o dy Let v ar

v ar num b e twice the maximum of v ar numr for all rules r in Let v ar b e

the set fx x g A pro of tree for is simply an expansion tree for all of

1

v ar num()

whose variables are from v ar We denote the set of pro of trees for a predicate Q of a

tr eesQ program by p

The intuition b ehind pro of tree is that variables are reused In an unfolding expansion

tree when we unfold a no de x we take a fresh copy of a rule r in In a pro of tree

we take instead an instance of r over v ar Since the numb er of variables in v ar

is twice the numb er of variables in any rule of we can instantiate the variables in the

b o dy of r by variables dierent from those in the goal

x

Example Figure describ es an unfolding expansion tree and a pro of tree for the

IDB predicate p in the transitiveclosure program of Example In the pro of tree

instead of using a new variable W we reuse the variable X

A pro of tree represents an expansion tree where variables are reused In other words

the same variable is used to represent a set of distinct variables in the expansion tree

Intuitively to reconstruct an expansion tree for a given pro of tree we need to distinguish

among o ccurrences of variables

pXY pXY eXZ pZY pXY pXY eXZ pZY

pXY pXY eXZ pZY

?

?

pZY pZY eZW pWY pZY pZY eZW pWY

pZY pZY eZX pXY

?

?

pWY pWY eWY pWY pWY eWY

pXY pXY eXY

a b

Figure a Unfolding Expansion Tree b Pro of Tree

Denition Let x and x b e no des in a pro of tree with a lowest common ancestor

1 2

x and let v and v b e o ccurrences in x and x resp ectively of a variable v We say

1 2 1 2

that v and v are connected in if the goal of every no de except p erhaps for x on the

1 2

simple path connecting x and x has an o ccurrence of v We say that an o ccurrence v

1 2

of a variable v in is a distinguished occurrence if it is connected to an o ccurrence of v

in the atom lab eling the ro ot of

From the denition ab ove it follows that connectedness is an equivalence relation and

it partitions the o ccurrences of variables in the pro of tree We denote the equivalence

class of an o ccurrence v of a variable v in a pro of tree by v We will omit when it

is clear from the context

Example Consider the pro of tree in Figure The o ccurrences of the variable Y in

the ro ot and in the interior no de are connected Both o ccurrences of Y are distinguished

The o ccurrences of the variable X in the ro ot and in the leaf are not connected The

o ccurrence of X in the ro ot is distinguished but the o ccurrence of X in the leaf is not

distinguished

Every pro of tree corresp onds to an expansion tree and hence to an expansion We

want to dene containment mappings from conjunctive queries to pro of trees such that

there is a containment mapping from a conjunctive query to a pro of tree i there is a

containment mapping from the conjunctive query to the expansion corresp onding to the

pro of tree The denition should force a variable in the conjunctive query to map to a

unique variable in the expansion corresp onding to the pro of tree

Denition A strong containment mapping from a conjunctive query to a pro of

tree is a containment mapping h from to with the following prop erties

h maps distinguished o ccurrences in to distinguished o ccurrences in and

if v and v are two o ccurrences of a variable v in then the o ccurrences hv and

1 2 1

hv in are connected

2

We now relate containment of programs and strong containment mappings

Prop osition Let be a conjunctive query and and let be a program with goal

predicate Q If is contained in then for every proof tree p tr eeQ there is

a strong containment mapping from to

Pro of Assume that is contained in and let p tr eeQ Rename every

o ccurrence v of a variable v in by an o ccurrence of a new variable v ie connected

[v ]

0 0

of v note and v o ccurrences v and v of a variable v are replaced by o ccurrences v

1 2

[v ]

2 1

1

that v v Denote this renaming which is a mapping on o ccurrences not on

1 2

variables by It is easy to prove that the result of this renaming is an expansion tree

0 0 0

call it Since is contained in there is a containment mapping h from to

We now dene a containment mapping h from to as follows Let u b e an o ccurrence

0

of a variable u in and supp ose that h u is v where v is an o ccurrence of the variable

0 0

v in ie h u is an o ccurrence of v in Dene hu v

[v ]

We have to show that h is also a mapping on variables Consider now two o ccurrences

0 0

u and u of a variable u in Then h u and h u are o ccurrences of some variables

1 2 1 2

0 00 0 0 00 0 00

v and v in where v and v are o ccurrences in But v and v must b e the

0 00 0 00

[v ] [v ] [v ] [v ]

00 0 0 0

coincide and v same variable since h is a containment mapping from to so v

00 0

[v ] [v ]

0 00

as well as v and v It follows that hu and hu are connected o ccurrences of the

1 2

same variable A similar argument shows that h maps distinguished o ccurrences in to

distinguished o ccurrences in It follows that h is a strong containment mapping from

to

Prop osition Let be a conjunctive query and and let be a program with goal

tr eeQ there is a strong containment predicate Q If for every proof tree p

mapping from to then is contained in

0

tr eeQ there is a strong containment Pro of Assume that for every pro of tree p

0

mapping from to Let b e an unfolding expansion tree of We obtain a pro of

0

tree from by renaming of variables in a topdown fashion Let x b e a no de of that

was not yet relab eled and that is lab eled in by The variables in the b o dy of

x x

either o ccur in or they do not o ccur in the lab el of any no de ab ove x We rename

x x

the variables in the b o dy of that do not o ccur in by variables from v ar that do

x x

not o ccur in distinct variables in the b o dy of are renamed by distinct variables of

x x

v ar This renaming can b e done since the numb er of variables in v ar is at least

twice the numb er of variables in any rule of Denote this renaming which is a mapping

0

by Note that the distinguished variables of are not renamed by this pro cess It is

0

also easy to verify that if v u and v u are connected o ccurrences in

1 1 2 2

then u and u must b e o ccurrences of the same variable in

1 2

0 0

Since is a pro of tree by assumption there is a strong containment mapping h

0 0

from to We use h to dene a containment mapping h from to We dene h on

0

o ccurrences of variables Let u v and w b e o ccurrences of variables u v and w in

0

and resp ectively If h w v and v u then we take hw u We claim that h

can also b e viewed as a mapping on variables ie hw u Indeed supp ose that w

1

0 0

and w are b oth o ccurrences of w in Since h is a strong containment mapping h w

2 1

0 0

and h w must b e connected in But then as observed ab ove there are o ccurrences

2

0 0

u and u of u in such that h w u and h w u A similar argument

1 2 1 1 2 2

shows that h maps distinguished variables in to distinguished variables in Thus h

is indeed a containment mapping

The prop ositions ab ove yield the following characterization of containment

Corollary Let be a program with goal predicate Q and let be a conjunctive

query Then is contained in if and only if there are strong containment mappings

from to al l proof trees in p tr eesQ

Since we are also interested in containment in union of conjunctive queries we need

the following characterization

Theorem Let be a program with goal predicate Q and let be a union

i i

of conjunctive queries Then is contained in if and only if for every proof tree

p tr eesQ there is a strong containment mappings from some to

i

Pro of Theorem tells us that if and are union of conjunctive

i i i i

queries then is contained in ie D D for every database D i each is

i

contained in some It follows that is contained in i each expansion tree resp

j

unfolding expansion tree is contained in some The claim now follows by rep eating

i

the arguments in the pro ofs of Prop ositions and

We will use the characterization ab ove to obtain optimal upp er b ound for containment

of programs in conjunctive queries

Upp er Bounds

The main feature of pro of trees as opp osed to expansion trees is the fact that the

numb ers of p ossible lab els is nite it is actually exp onential in the size of Because

the set of lab els is nite the set of pro of trees p tr eesQ for an IDB predicate Q in

a program can b e describ ed by a tree automaton

Prop osition Let be a Datalog program with a goal predicate Q Then there is

p tr ees tr ees p

whose size is exponential in the size of such that T A an automaton A

Q Q

p tr eesQ

Pro of We describ e the construction of the automaton

p tr ees

A I facceptg I facceptg

Q

Q

The state set I is the set of all IDB atoms with variables among v ar The startstate

set is the set of all atoms Qs where the variables of s are in v ar The alphab et

I R where R is the set of instances of rules of over v ar The transition

function is constructed as follows

Let b e a rule instance

1 m

Rt R t R t

1 m

i i

1

l

in R where the IDB atoms in the b o dy of the rule are R t R t Then

i i

1

l

i i

1

l

hR t R t i Rt Rt

i i

1

l

Let b e a rule instance

1 m

Rt R t R t

1 m

in R where all atoms in the b o dy of the rule are EDB atoms Then haccepti

Rt Rt

tr ees p

p tr eesQ It is easy to see that the numb er of states It follows that T A

Q

and transitions in the automaton is exp onential in the size of

We now show that strong containment of pro of trees in a conjunctive query can b e

checked by tree automata as well

Prop osition Let be a Datalog program with goal predicate Q and let be

a conjunctive query Then there is an automaton A whose size is exponential in the

Q

is the set of proof trees in p size of and such that T A tr eesQ where there

Q

is a strong containment mapping from to

Pro of We describ e the construction of A and then prove its correctness

Q

We view as a set of atoms Every state of the automaton includes a subset of atoms

of that have not yet b een strongly mapp ed to Such unmapp ed atoms may share

variables with atoms that have already b een mapp ed Therefore also included in the

state description is a partial mapping that indicates the images of the mapp ed variables

A transition on an input symb ol results in mapping of zero or more unmapp ed

atoms to the b o dy of The remainder of the unmapp ed atoms are partitioned among

the sequence of states prescrib ed by the transition

The automaton A is S facceptg S facceptg The sets I and I R

Q

Q

are as in the pro of of Prop osition We assume that the conjunctive query has a set

V v ar ()

of variables V The state set S is the set I The second comp onent in

S represents the collection of subsets of atoms of and the nal comp onent contains

the set of partial mappings from V to v ar The startstate set S consists of all

Q

triples Qs M where the variable of s are in v ar and M is a mapping of the

s s

distinguished variables of into the variables of s The transition function is constructed

as follows

Let b e a rule instance

1 m

Rt R t R t

1 m

i i

1

l

R t Then t in R where the IDB atoms in the b o dy of the rule are R

i i

1

l

i 0 i 0

1

l

hR t M R t M i Rt M Rt

i 1 i l

1

l

if the following hold

0 0

can b e partitioned into where is mapp ed to atoms in the

1 l

0

that is consistent with M b o dy of by a mapping M

0

0

M is a partial mapping that extends M and is consistent with M

0

and can share a variable only if this variable is in the domain of M and

j k

i i

j

k

its image is in b oth t and t

0

If a variable o ccurs in and it is in the domain of M then its image is in

j

i

j

t

Let b e a rule instance

1 m

Rt R t R t

1 m

in R where all atoms in the b o dy of the rule are EDB atoms Then haccepti

Rt M Rt if there is a mapping that extends M and maps all literals

in to atoms in the b o dy of

It is easy to see that the numb er of states and transition in the automaton is exp o

nential in the size of and We now show the correctness of our construction First

we show that if there is a strong containment mapping h from to then is accepted

We prove acceptance by showing the existence of an accepting run r by A

Q

We show that our denition of r satises the inductive prop erty that if Rt is the

goal lab eling a no de x then r x Rt M where M is consistent with h and h

maps to atoms in b o dies of rules lab eling x or no des b elow x

The run starts with r Qs M where Qs b e the atom lab eling the ro ot

s

of and M is the restriction of h to the distinguished variables of the range of M

s s

are the variables of s Since h is a strong containment mapping from to it follows

that all literals in are mapp ed in Thus the sp ecication of the ro ot of satises the

inductive prop erty

Supp ose now that x is not a leaf no de and has l children Assume that x

Rt We know that must b e an instance of a recursive rule in R

1 m

Rt R t R t

1 m

i i

1

l

where the IDB atoms in the b o dy of the rule are R t R t Thus x has

i i

1

l

l children lab eled by the IDB atoms in the b o dy of By inductive hyp othesis let

r x Rt M where h maps to atoms in rules lab eling no des b elow x We can

0 0

partition into subsets where is mapp ed by h to atoms in the b o dy of

1 l

and is mapp ed by h to atoms in b o dies lab eling the no de xj or no des b elow xj

j

0 0

We obtain M from M by adding to M the pairs consisting of variables in and their

corresp onding images in h Also supp ose that and share a variable Since h is

j k

i

j

and strong it must maps the o ccurrences of this variable in and to o ccurrences in t

j k

i 0

k

t In that case we add the pair consisting of this variable and its image in h to M We

0

M Note that our construction ensures that r S t now dene r xj R

j 0 i i

j j

and if x is an internal no de with children x xl then hr x r xl i r x x

Finally if x is a leafno de then x Rt where is instance of a nonrecursive

rule Our inductive prop erty ensures that r x Rt M where all literals in map

to literals in that is consistent with h Therefore from the description of the automaton

it follows that accept r x x Thus r is an accepting run

Second we show that if is accepted by A then there is a strong containment

Q

mapping h from to Let r b e an accepting run The pro of is by b ottomup induction

on the tree The inductive hyp othesis is that if r x Rt M then Rt is the goal

lab eling x and there is a mapping h that is consistent with M and maps to atoms

x

in b o dies of rules lab eling x or no des b elow x Furthermore h is a strong mapping it

x

maps o ccurrences of the same variable in to connected o ccurrences in Supp ose rst

that x is a leaf Since r is an accepting run x is lab eled by Rt where is a rule

instance

1 m

Rt R t R t

1 m

in R and all atoms in the b o dy of the rule are EDB atoms and there is a mapping h

x

that extends M and maps all literals in to atoms in the b o dy of Thus the inductive

hyp othesis holds for leaves

Supp ose now that x is not a leaf and let x xl b e the children of x Then x is

lab eled by Rt where is a rule instance

1 m

Rt R t R t

1 m

i i

1

l

Thus we have in R and the IDB atoms in the b o dy of the rule are R t R t

i i

1

l

i 0 0

j

r xj Rt M j l where is a partition of that satises

j 1 l

the conditions in the denition of By the inductive hyp othesis for j l and

0

there is a mapping h that is consistent with M and maps to atoms in b o dies of rules

j j

lab eling xj or no des b elow xj The denition of guarantees that the h s are consistent

j

0

0

that maps to atoms in the b o dy of with each other and that there is a mapping M

0

0 0

is a partial mapping Thus the union of the h s with M and M is an extension of M

j

h that is consistent with M and maps to atoms in the b o dies of rule lab eling x or

x

no des b elow x Furthermore the denition of guarantees that h is strong

x

Now let r Qs M By the induction hyp othesis Qs is the goal lab eling

s

the ro ot of and there is a strong mapping h that extends M and maps to atoms

s

in b o dies of rules lab eling no des in Thus h is a strong containment mapping from

to

We can now reduce the containment problem for Datalog programs in unions of

conjunctive queries to an automatatheoretic problem

Theorem Let be a program with goal predicate Q and let be a union

i i

of conjunctive queries Then is contained in if and only if

tr ees p

i

T A T A

Q Q

i

Pro of By Theorem is contained in if and only if for every pro of tree

p tr eesQ there is a strong containment mappings from some to By Prop osi

i

tions and the latter condition is equivalent to

tr ees p

i

T A T A

Q Q

i

Theorem Containment of a recursive Datalog program in a union of conjunctive

queries is in EXPTIME EXPSPACE for linear programs



whose size is Pro of By Prop ositions and we can obtain an automaton A

Q

exp onential in the size of and such that



i

T A T A

Q Q

i

Thus by Theorem containment in a union of conjunctive queries can b e reduced

to containment of tree resp word automata of exp onential size Since containment of

tree automata can b e decided in exp onential time Prop osition and containment of

word automata can b e decided in p olynomial space Prop osition the result follows

Remark The automatatheoretic technique used here is closely related to the

automatatheoretic techniques used in CGKV to prove the decidability of b ounded

ness of monadic programs The result here however is more robust since it applies to

programs of arbitrary arity In contrast b oundedness is undecidable for binary programs

Va HKMV

Remark So far we have assumed that neither the recursive program nor the

union of conjunctive queries contain constants However this restriction is easily relaxed

by redening the containment mapping Dention The pro of of Theorem

then extends in a straightforward fashion In the presence of constants a containment

mapping from a conjunctive query to a conjunctive query is a renaming of variables

sub ject to the following constraints a every distinguished variable must map to itself

and b every nondistinguished variable must map to either a variable or a constant in

and c after renaming every literal in must b e among the literals of

Lower Bounds

Theorem provides a doubly exp onential time resp exp onential space upp er b ound

for containment of resp linear Datalog programs in a union of conjunctive queries We

now show that these b ounds are optimal We accomplish this via a succinct enco ding of

alternating resp deterministic exp onentialspace Turing machines It is known that

alternating exp onentialspace machines have the same computational p ower as doubly

exp onentialtime Turing machines CKS thus deciding if an alternating Turing ma

n

chine accepts the empty tap e using space is complete for doubly exp onential time

We fo cus rst on linear programs and exp onentialspace Turing machines A congu

ration of an exp onentialspace Turing machine M can b e describ ed by a string of length

n

The symb ols of the string are either symb ols of the input alphab et or composite

symb ols A comp osite symb ol is a pair s a where s is a state of M and a is an input

symb ol Such a comp osite symb ol denotes the fact that M is in state s and is scanning

the symb ol a An imp ortant feature of Turing machine computations is the locality of

the transitions ie the succession relation b etween congurations dep ends only on lo cal

constraints We can asso ciate with M a ary relation R on symb ols that characterizes

M

the transitions of M Supp ose that a a a and b b b are two congura

1 m 1 m

n

tions m Then b is a successor conguration of a only if a a a b R

i1 i i+1 i M

r l

on and R for i m We also need to asso ciate with M two ary relations R

M M

symb ols that characterize the transitions at the left and right end of the conguration

r l

and a a b R ie a a b R

m1 m m 1 2 1

M M

The idea of our enco ding is that the unfolding expansions of the recursive program

corresp ond to a sequence of congurations ending with an accepting conguration The

role of the union of conjunctive queries is to check whether the sequence corresp onds

to an accepting computation If an expansion do es not corresp ond to an accepting

computation then we will have that Thus we will have that if and

only if the machine M do es not accept In order to check that an expansion do es not

corresp ond to an accepting computation we have to compare corresp onding p ositions

on successive congurations To do that we address each p osition in a conguration

we need n bits for each address In our enco ding each rule unfolding will describ e one

address bit Thus each p osition in a conguration will b e enco ded by n rule unfoldings

n

If a and b are two nbit numb ers then b a mo d

n 1 n 1

precisely when the following hold for i n we have that i for some

i i j

j i Since this condition is not lo cal we enco de carry bits in addition to address

n

bits Now b a mo d if and only if there is an nbit carry c such

n 1

that precisely when and for i n and

1 i+1 i i i

precisely when either b oth and or b oth and for i n

i i i i

Thus succession of addresses also has the lo cality prop erty if the carry bits are available

We enco de congurations in the following manner Let B it B it b e ary IDB

1 n

predicates and let A A b e ary EDB predicates

1 n

The rst two arguments of A act as the constants and

i

the rd and th arguments of A enco de address and carry bits resp ectively

i

the th and th arguments of A link successive bits and

i

the th and th arguments of A link successive congurations

i

For i n we have in the following rules

0 0

B it x y z u v B it x y z u v A x y x x z z u v

i i+1 i

0 0

B it x y z u v B it x y z u v A x y x y z z u v

i i+1 i

0 0

B it x y z u v B it x y z u v A x y y x z z u v

i i+1 i

0 0

B it x y z u v B it x y z u v A x y y y z z u v

i i+1 i

The intuition is that each unfolding of a rule for a B it predicate describ es one address

i

0

bit The variable z can b e thought as a p ointer to an address bit while z p oints to

the next bit Note the four p ossible combinations of variables in the third and fourth

arguments of b o dy EDB predicate A Each combination enco des two bits of information

an address bit and a carry bit Intuitively x and y which are persistent variables ie

they app ear b oth in the head atom and in the recursive b o dy atom act as the constants

and That is the third argument b eing x or y corresp onds to the address bit b eing

or resp ectively Similarly the fourth argument b eing x or y corresp onds to the carry

bit b eing or resp ectively The carry bits enco de the carry obtained when the previous

address is incremented by Note that the variables u and v are also p ersistent in the

ab ove rules this p ersistence connects no des that b elong to the same conguration

The rules for B it enco de also the symb ol p ointed to by the nbit address For each

n

symb ol a of the machine M we have a unary EDB predicate Q The symb ol in a

a

conguration p osition is enco ded by rules of the form

0 0

B it x y z u v B it x y z u v A x y x x z z u v Q z

n 1 n a

The rd and th arguments of A could also b e the pair x y the pair y x or the pair

n

y y

So far the rules enco de a sequence of address bits and tap e symb ols To enco de the

start of the computation we use the ary goal predicate C a ary EDB predicate S tar t

and the rule

C B it x y z u v S tar tz

1

To enco de the end of the computation we use rules of the form

0

B it x y z u v A x y x x z z u v Q z

n n a

for symb ols a that corresp ond to accepting states The rd and th arguments in A

n

could also b e the pair x y the pair y x or the pair y y

Finally to enco de the transition from conguration to conguration we use rules of

the form

0 0 0

B it x y z u v B it x y z u u A x y x x z z u v Q z

n 1 n a

The rd and th arguments in A could also b e the pair x y the pair y x or the pair

n

y y Notice that u p ersists but changes p osition but v do es not p ersist only x and y

p ersist along nonsuccessive congurations Intuitively us role is to connect successive

congurations

We now have to show how the conjunctive queries in nd errors in enco ding of

computations The queries in have no distinguished variables We describ e each

disjunct of by listing its atomic formulas We use dots to denote variables with unique

o ccurrences

The rst thing that we need to check is that the address bits indeed act as an nbit

counter That is the rst address is and two adjacent addresses are successive

Thus one p ossible error is that the rst address is not Such an error can b e

found by the following conjunctive query

S tar tz A x y z z u v A x y y z z u v

1 1 1 2 i i i+1

Here the third argument of A is y expressing the fact that the ith address bit is

i

The other errors that can prevent adjacent addresses from b eing successive are

the rst carry bit is

the ith address bit and the ith carry bit are but the i st carry bit is

the ith address bit or the ith carry bit is but the i st carry bit is

the ith address bit and the ith carry bit are but the i st address bit is

the ith address bit and the ith carry bit are but the i st address bit is

the ith address bit is and the ith carry bit is but the i st address bit is

the ith address bit is and the ith carry bit is but the i st address bit is

We show how errors of for example typ e can b e discovered the other errors can

b e handled similarly Such errors are found by the following conjunctive query

A x y y z z

i i i+1

A x y z z A x y z z

i+1 i+1 i+2 n n n+1

A x y z z

1 n+1 n+2

A x y y z z

i n+i n+i+1

A x y x z z

i+1 n+i+1 n+i+2

The y s in the third argument on the rst A atom and the fourth argument of the second

i

A atom mean that the ith address bit and the ith carry bit are The x in the fourth

i

argument of the A atom means that the i st carry bit is Note that when the

i+1

rst n atoms refer to an address the following i atoms refer to the next address

n

Next we note that that every sequence of addresses starting with has

to describ e a single conguration that is we have to ensure that conguration change

exactly when the address is Thus there are two typ es of error here a

conguration change when the address is not and a conguration do es not

change when the address is For example error of the rst typ e are found by the

following conjunctive query

A x y x z z u u

i i i+1

A x y z z u v

n n n+1

0

A x y z z u u

1 n+1 n+2

The y s in the third argument of the rst A atom means that the ith address bit is

i

Thus the address is not The fact that the variable u moved from the th p osition

in the A atom to the th p osition in the second A atom indicates a conguration change

n i

n

We have so far ensured that we have a sequence of congurations of length with the

prop er sequence of addresses We now have to enure that these sequence of conguration

indeed represent a legal computation of the machine M The rst typ e p ossible error

here is when the rst conguration do es not corresp ond to the empty tap e with the head

scanning the leftmost symb ol If is the blank symb ol and s is the initial state then

n

2 1

the rst conguration is hs i To ensure that the rst symb ol is indeed hs i

the query

S tar tz A x y z z u v

1 1 1 2

A x y z z u v Q z

n n n+1 a n

checks whether the rst symb ols is not a for each a hs i Note that the variables

z z ensure that this query checks the rst symb ol in the rst conguration Sim

1 n

ilarly to ensure that the rest of the symb ols in the rst conguration are blank the

query

S tar tz A x y z u v

1

A x y y z z u v

i i i+1

A x y z z u v Q z

n n n+1 a n

checks that the rst symb ols is not a for each a Because of the variable z the rst

A atom must map to the rst conguration The variables u v then ensures that the

1

atoms A A also map to the rst conguration The fact that the rd argument of

1 n

A is y means that the query do es not check the rst symb ol since one of the address

i

bits is

Another typ e of errors is b etween corresp onding symb ols in two successive congura

tions ie when such symb ols do not ob ey the restrictions imp osed by the relations R

M

r l

For example a violation of R will b e found by conjunctive queries of the and R R

M

M M

following form

A x y s z z u v

1 1 1 2

A x y s z z u v Q z

n n n n+1 a n

A x y s z z u v

1 n+1 n+1 n+2

A x y s z z u v Q z

n 2n 2n 2n+1 b 2n

A x y s z z u v

1 2n+1 2n+1 2n+2

A x y s z z u v Q z

n 3n 3n 3n+1 c 3n

0

A x y s z z u u

1 n+1 4n+1 4n+2

0

A x y s z z u u Q z

n 2n 5n 5n+1 d 5n

The pattern of the z s variables and the u v variables ensures that the rst three blo cks

i

of A A atoms are mapp ed to three successive p ositions on the same conguration

1 n

0

The patern of the variables u v u enures that the last blo ck of A A atoms is

1 n

mapp ed to the next conguration Finally the reuse of the variables s s ensures

n+1 2n

the the second blo ck and the last blo ck refer to the same address and therefore are

mapp ed to corresp onding p ositions on successive congurations Here a b c d are such

that a b c d R

M

By adding to conjunctive queries corresp onding to all p ossible errors in the expan

sions of there are O n such errors we reduce the acceptance problem for exp onential

space Turing machines to containment of linear programs in unions of conjunctive queries

We now sketch how this enco ding can b e extended to alternating exp onentialspace

machines An alternating machine M has existential and universal states Without loss

of generality we can assume that the machine always alternates b etween existential

and universal states and every conguration of M have two p ossible successors a left

successor and a right successor The latter can b e captured by M having two transition

relations one for left successors and one for right successors An accepting computation

of M is a tree of congurations where each conguration is a successor of its parent a

universal conguration ie a conguration on which M is in a universal state has b oth

its successors as children and all leaves are accepting congurations

To enco de a computation tree we add to the B it and A predicates two additional

i i

arguments The rule

0 0

B it x y z u v B it x y z u v A x y x x z z u v

i i+1 i

will b e replaced by the rule

0 0

B it x y z u v w t B it x y z u v w t A x y x x z z u v w t

i i+1 i

The intuition is that t is either x when the conguration is existential or y when the

conguration is universal The pair u v was replaced by the triple u v w to account

for the fact that universal conguration has two successors The other rules for B it are

i

replaced analogously

We assume that the starting state is existential so the rule for C is

C B it x y z u v w x S tar tz

1

The rules that enco de transitions b etween congurations have to check whether the

source conguration is existential or universal For existential congurations we have

rules such as

0 0 0 0

B it x y z u v w x B it x y z u u w y A x y x x z z u v w x Q z

n 1 n a

0 0 0 0

B it x y z u v w x B it x y z u v u y A x y x x z z u v w x Q z

n 1 n a

The th argument of B it is x here since these rules are for existential congurations

n

Here u migrates either to the th argument or the th argument of B it Migration to

1

the th argument corresp onds to a transition to a left successor while migration to the

th argument corresp onds to a transition to the right successor

For universal congurations we have rules such as

0 0 0 0 0 0

B it x y z u v w y B it x y z u u w x B it x y z u v u x

n 1 1

0

A x y x x z z u v w y Q z

n a

Here the th argument of B it is y since this rule is for universal congurations This

n

rule is nonlinear the two o ccurrences of B it in the b o dy corresp ond to transitions to

1

b oth the left successor and the right successor

The conjunctive queries in also have to b e revised to account for the additional

arguments of the A s There are also additional errors to capture For example if a is

i

a comp osite symb ol with a universal state then we have the query

A x y z z x Q z

n n n+1 a n

This query nds universal congurations marked as existential

The queries that capture transition errors have to distinguish b etween left successors

and right successors For example the following query is directed at transitions to left

successors

A x y s z z u v w t

1 1 1 2

A x y s z z u v w t Q z

n n n n+1 a n

A x y s z z u v w t

1 n+1 n+1 n+2

A x y s z z u v w tQ z

n 2n 2n 2n+1 b 2n

A x y s z z u v w t

1 2n+1 2n+1 2n+2

A x y s z z u v w tQ z

n 3n 3n 3n+1 c 3n

0 0 0

A x y s z z u u w t

1 n+1 4n+1 4n+2

0 0 0

A x y s z z u u w t Q z

n 2n 5n 5n+1 d 5n

Here u migrates one p osition to the right For right successors we need queries where u

migrates two p ositions to the right

Theorem Containment of a recursive Datalog program in a union of conjunctive

queries is complete for EXPTIME EXPSPACE for linear programs

Equivalence to Nonrecursive Programs

In the previous section we studied the complexity of containment of recursive programs

in unions of conjunctive queries In this section we fo cus on containment of recursive

programs in nonrecursive programs ie we are given a recursive program and a

0 0

nonrecursive program and we have to decide whether is contained in The

0

straightforward approach is to rewrite as a union of conjunctive queries and apply the

results of the previous section Unfortunately rewriting nonrecursive programs as unions

of conjunctive queries may involve an exp onential blowup in size We now show several

examples the queries dened in this example will b e used later in the lowerb ound pro of

Example Let E b e a binary EDB predicate ie the database is a directed graph

Consider the nonrecursive program consisting of the rules for i n

dist x y dist x z dist z y

i i1 i1

and the rule

dist x y E x y

0

i

Clearly dist x y holds precisely when there is a path of length b etween x and y It

i

is easy to see that the smallest conjunctive query equivalent to dist is of exp onential

n

size

Example This example is a variant of the previous example Consider the nonre

cursive program consisting of the rules for i n

dist x y dist x z dist z y

i i1 i1

dist x y dist x z dist z y

i i1 i1

and the rules

dist x y E x y

0

dist x x

0

dist x x

0

Note the empty b o dies in the last two rules the convention is that an empty b o dy is

equivalent to true Here dist x y holds precisely when there is a path of length at

i

i

most b etween x and y and dist x y holds precisely when there is a path of length

i

i

at most b etween x and y

Example To the EDB predicate E of the previous example add unary EDB

predicates Z er o and O ne ie the database is a no delab eled directed graph Consider

the nonrecursive program consisting of the rules for i n

0 0 0 0

eq ual x y u v eq ual x x u u eq ual x y u v

i i1 i1

and the rules

eq ual x y u v E x y E u v Z er ox Z er ou

0

eq ual x y u v E x y E u v O nex O neu

0

i

Here eq ual x y u v holds precisely when there are paths length b etween x and y

i

and b etween u and v resp ectively and the paths have the same lab els with the p ossible

exception of the last p oints

In general the upp er b ounds of Theorem together with the exp onential blowup

involved in rewriting nonrecursive programs as union of conjunctive queries imply upp er

b ounds of triply exp onential time for containment of recursive programs in nonrecursive

programs and doubly exp onential space for containment of linear recursive programs

in nonrecursive programs It turns out that the succinctness of nonrecursive programs

is inherent and these b ounds are optimal We can enco de doublyexp onential space

n

b ounded computation by using bit counters We can discover errors among bits that

are doubly exp onentially far from each other using programs such as those in the ab ove

examples

We fo cus rst on linear programs and doubly exp onentialspace Turing machines A

n

2

conguration of such a machine M can b e describ ed by a string of length Thus we

n

need address bit to describ e a p osition in the conguration

As in Section each rule unfolding in our enco ding will describ e one address bit

n

Thus each p osition in a conguration will b e enco ded by rule unfoldings In Sec

tion we had the predicates B it B it to denote the n address bits We cannot

1 n

n

do this here since we have address bits Instead we use one ternary IDB predicate

B it and one binary EDB predicate A In addition we use several unary EDB predicates

The recursive program contains the following rules

0 0

B itz u v B itz u v Az u v Addr essz E z z Z er oz C ar r y z

0 0

B itz u v B itz u v Az u v Addr essz E z z Z er oz C ar r y z

0 0

B itz u v B itz u v Az u v Addr essz E z z O nez C ar r y z

0 0

B itz u v B itz u v Az u v Addr essz E z z O nez C ar r y z

The dierence from the rules for B it in Section is that we have additional EDB atoms

i

0

Addr essz E z z Z er oz O nez Carryz and C ar r y z Here the variable z

p oints to an address bit or to a symb ol So we call these variables points The atom

Addr essz says that these rules describ e address p oints so we call these rules address

0

rules The atoms E z z connects adjacent p oints The rest of the atoms enco de the ad

dress and carry bits Notice that these atoms enco de the information that was previously

carried by the rst four arguments of the A predicates

i

We have separate rules in for enco ding conguration symb ols

0 0

B itz u v B itz u v Az u v E z z S y mbol z Q z

a

The atom S y mbol z says that the rule describ es a symb ol so we call this rule a symbol

rule

To enco de the start of the computation we use the ary goal predicate C a ary

EDB predicate S tar t and the rule

C S tar tz B itz u v Az u v Addr essz Z er oz C ar r y z

which means that the rst bit of the rst conguration is the address bit

To enco de the end of the computation we put in rules of the form

B itz u v Az u v S y mbol z Q z

a

for symb ols a that corresp ond to accepting states That is the last symb ol of the last

conguration is an accepting symb ol

Finally to enco de the transition from conguration to conguration we put in

rules of the form

0 0 0

B itz u v B itz u u Az u v E z z S y mbol z Q z

a

0

We now describ e the construction of As in the previous lowerb ound pro of the idea

of our enco ding is that the unfolding expansions of the recursive program corresp ond to

a sequence of congurations ending with an accepting conguration of the machine M

0

The role of the nonrecursive program is to check whether the sequence corresp onds

to an accepting computation If an expansion do es not corresp ond to an accepting

0

computation then we will have that Thus we will have that is contained in

0

if and only if the machine M do es not accept

In Section unfolding expansions of the recursive program were guaranteed to

corresp ond to sequences of nbit addresses with a symb ol attached to each nth bit

n

Here we want unfolding expansions that corresp onds to sequences of bit address p oints

followed by a symb ol p oint This however is not guaranteed by the program since we

0

have a single B it predicate Instead we have rules in that lter out expansions that

are not of this format

All expansions start with the unfolding of the start rule We need to verify that this

n

is followed by unfoldings of address rules and then an unfolding of the symb ol

0

rule This is veried by putting the following rule in

0 0

C S tar tz dist z z S y mbol z

n

and the rule

0 0

C S tar tz dist z z Addr essz

n

n

The former rule nds expansions where one of the rst unfoldings is of a symb ol rule

n

The latter rule nds expansion where the st unfolding is of an address rule

n

The ab ove queries take care of the rst unfoldings To make sure that the

expansions has the right format we also need the rule

0 00

C S y mbol z dist z z S y mbol z

n

and the rule

0 0 00 00

C S y mbol z dist z z E z z Addr essz

n

n

This makes sure that a symb ol p oint is followed by address p oints followed by a symb ol

p oint

So far we have ensured that that we have ltered away all expansions to do not

n

corresp ond to sequences of blo cks of address p oints followed by a symb ol p oint Anal

n

ogously to Section we need to check is that the address bits indeed act as an bit

counter That is the rst address is and two adjacent addresses are successive

As in Section there are p ossible errors in the counter For example a p ossible

error is that the rst address is not Such an error can b e found by the following

query

0 0

C S tar tz dist z z O nez

n

Another p ossible addressing error is when the ith address bit and the ith carry bit are

but the i st carry bit is Such an error is found by the following rule

0 0 00

C Addr essz O nez dist z z E z z

n

00 00 000 000

C ar r y z E z z C ar r y z

This query is using the fact that we need only consider expansions in which the distance

n 00

b etween corresp onding address p oints is precisely Thus z and z p oint to address

000

p oints in corresp onding p ositions of successive addresses and z is the next address

p oint

n

So far we have ensured that our addresses indeed act as a bit counter We now

n

2

have to ensure that every sequence of addresses starting with describ e a

single conguration that is we have to ensure that conguration change exactly when

the address is Thus there are two typ es of error here a conguration

change when the address is not and a conguration do es not change when

the address is For example error of the rst typ e are found by the following

0

rule in

0 0 0 00 00 0

C Addr essz Az u v Z er oz dist z z S y mbol z E z z Az u u

n

The atom Z er oz indicates that we are referring to an the address that is not

The fact that the variable u moved from the nd p osition in the rst A atom to the th

p osition in the second A atom indicates a conguration change

n

2

We have so far ensured that we have a sequence of congurations of length

with the prop er sequence of addresses We now have to enure that this sequence of

congurations indeed represent a legal computation of the machine M For example we

have to detect errors b etween corresp onding symb ols in two successive congurations

l

ie when such symb ols do not ob ey the restrictions imp osed by the relations R R

M

M

0 r

A violation of R will b e found by rules in of the following form and R

M

M

C Az u v Q z E z t

1 a 1 1 1

At u v dist t z

1 n 1 2

0 0

z E z Az u v Q z dist z z

3 2 b 2 n 2

2 3

Az u v Q z

3 c 3

At u dist t z

2 n 2 4

A z u Q z

n 4 d 4

eq ual t z t t

n 1 2 2 4

Here the variables z z and z p oint to three consecutive symb ols a b and c The

1 2 3

variables t p oints to the rst address p oint preceding z The variable z p oints to the

1 2 4

symb ols d in the successor conguration notice that u migrates one p osition to the right

and t p oints to the rst address p oint preceding z The atom eq ual t z t z guar

2 4 n 1 2 2 4

antees that z and z have the same addresses so they p oint to symb ols in corresp onding

2 4

p ositions

0

By adding to rules corresp onding to all p ossible errors in the expansions of

there are O n such errors we reduce the acceptance problem for doubly exp onential

space Turing machines to containment of linear programs in nonrecursive programs We

deal with nonlinear programs as in Section that is by adding arguments to B it and

A and using nonlinear rule we can force universal congurations to have two successors

Theorem Containment of a recursive Datalog program in a nonrecursive Datalog

program is complete for EXPTIME EXPSPACE for linear programs

0

Before stating our nal result we note that if is a recursive program and is a

0

nonrecursive program such that b oth and have the same goal predicate Q and Q

0 0 0

o ccurs only in heads of rules than if and only if Thus containment

in nonrecursive programs is reducible to equivalence to nonrecursive programs

Theorem Equivalence of recursive Datalog programs to nonrecursive Datalog pro

grams is complete for EXPTIME EXPSPACE for linear recursive programs

We note that the blowup in the translation from nonrecursive programs to unions

of conjunctive queries is caused by the nonlinearity of the nonrecursive programs For

linear nonrecursive programs the translation yields a union of conjunctive queries where

each conjunctive query is of size that is linear in the size of the nonrecursive program

though the numb er of terms in the union could b e exp onential

Example Again we use the binary EDB predicate E and the unary EDB predicates

Z er o and O ne Consider the nonrecursive program consisting of the rules for i n

0 0

w or d x y w or d x x E x y Z er oy

i i1

0 0

w or d x y w or d x x E x y O ney

i i1

and the rules

w or d x y E x y Z er ox

1

w or d x y E x y O nex

1

It is easy to see that by unfolding w or d to a union of conjunctive queries we get exp o

n

nentially many terms in the union but each term is of size O n

Nevertheless the pro of of Theorem do es go through and the b ounds of the

theorem hold for linear nonrecursive programs

Theorem Equivalence of recursive Datalog programs to nonrecursive linear Datalog

programs is complete for EXPTIME EXPSPACE for linear recursive programs

0

Pro of Let b e a program with goal predicate Q and let b e a nonrecursive linear

0

program with goal Q As observed ab ove can b e unfolded to an exp onential union

0

of conjunctive queries each of size linear in the size of By Theorem

i i

is contained in if and only if

p tr ees

i

T A T A

Q Q

i



By Prop ositions and we can obtain an automaton A whose size is exp onential

Q

0

in the size of and such that



i

T A T A

Q Q

i

The upp er b ound then follows as in Theorem

Concluding Remarks

Courcelles Theorem yields the decidability of equivalence to nonrecursive program but

with very weak upp er b ounds Using the automatontheoretic technique we found tighter

upp er b ounds triplyexp onential time in general and doublyexp onential space for linear

programs and proved matching lower b ounds This shows that the problem is intractable

in fact to our knowledge this is the most intractable decidable problem is

We note that one has to b e careful in interpreting lower b ounds for query containment

While containment of conjunctive queries in recursive program is complete for EXPTIME

CK CLM Sab this complexity is simply the expression complexity of evaluation

Datalog programs Va In fact if attention is restricted to programs of b ounded

arity we get NPcompleteness instead of EXPTIMEcompleteness In contrast our lower

b ounds here imply real intractability and they hold even for programs of b ounded arity

For certain classes of recursive Datalog programs such as monadic linear programs

and singlerule programs the equivalence problem is less intractable than the general

case We discuss these cases in another pap er CV

Acknowledgement

We thank Shuki Sagiv for discussions on the complexity of query containment and Je

Ullman for useful comments that help ed improve a previous draft of the pap er

References

AU Aho AV Ullman JD Universality of data retrieval languages Proc th

ACM Symp on Principles of Programming Languages pp

AV Abiteb oul S Vianu V Fixp oint Extensions of FirstOrder Logic and

Dataloglike languages Proc th IEEE Symp on Logic in Computer Sci

ence pp

Be Beeri C On the memb ership problem for functional and multivalued

dep endencies in relational databases ACM Trans on Database Systems

pp

BR Bancilhon F Ramakrishnan R An amateurs intro duction to recursive

query pro cessing strategies Proc ACM Conf on Management of Data

Washington pp

Ch Chandra AK Programming primitives for database languages Proc th

ACM Symp on Principles of Programming Languages Williamsburg

pp

CH Chandra AK Harel D Computable queries for relational databases J

Computer and Systems Sciences pp

CH Chandra AK Harel D Structure and Complexity of Relational Queries

J Computer and Systems Sciences pp

CH Chandra AK Harel D Queries and Generalizations

J Logic Programming

CKS Chandra A Kozen A Sto ckmeyer L Alternation J ACM

pp

CLM Chandra A K H R Lewis and J A Makowsky Emb edded implicational

dep endencies and their inference problem Proc th ACM Symp on Theory

of Computing pp

CV Chaudhuri S Vardi MY On the complexity of equivalence b etween

recursive and nonrecursive Datalog programs Proc th ACM Symp on

Principles of Database Systems pp

CGKV Cosmadakis SS Gaifman H Kanellakis PC Vardi MY Decidable

optimization problems for database logic programs Proc th ACM Symp

on Theory of Computing pp

CK Cosmadakis SS Kanellakis P Parallel evaluation of recursive rule

queries Proc th ACM Symp on Principles of Database Systems Cam

bridge pp

Cos Costich OL A Medvedev characterization of sets recognized by general

ized nite automata Math System Theory pp

Cou Courcelle B The monadic secondorder theory of graphs I Recognizable

sets of nite graphs Information and Computation pp

Cou Courcelle B Recursive queries and contextfree graph grammars Theoret

ical Computer Science pp

Do Doner JE Tree acceptors and some of their applications J Computer

and System Sciences pp

EJ Emerson EA Jutla CS Complexity of tree automata and mo dal logics

of programs Proc th IEEE Symp on Foundations of Computer Science

pp

GM Gallaire H Minker J Logic and Databases Plenum Press

GMSV Gaifman H Mairson H Sagiv Y Vardi MY Undecidable optimization

problems for database logic programs J ACM pp

HKMV Hillebrand GG Kanellakis PC Mairson HG Vardi MY To ols for

Datalog b oundedness Proc th ACM Symp on Principles of Database

Systems May pp

HKMV Hillebrand GG Kanellakis PC Mairson HG Vardi MY Undecidable

boundedness problems for Datalog programs To app ear in J Logic Program

ming

Im Immerman N Relational queries computable in p olynomial time Infor

mation and Control pp

Jo Jones ND Spaceb ounded reducibility among combinatorial problems J

Computer and System Sciences pp

K Kanellakis PC Elements of Relational Database Theory Handbook of The

oretical Computer Science Vol B Chapter J van Leeuwen AR Meyer

N Nivat MS Paterson D Perrin ed NorthHolland

KA Kanellakis P Abiteb oul S Deciding b ounded recursion in database logic

programs SIGACT News Fall

MUV Maier D Ullman JD Vardi MY On the foundations of the universal

relation mo del ACM Trans on Database Systems pp

Mey van der Meyden R Recursively indenite databases Theoretical Computer

Science pp

MF Meyer AR Fischer MJ Economy of description by automata gram

mars and formal systems Proc st IEEE Symp on Switching and Au

tomata Theory pp

MS Meyer AR Sto ckmeyer LJ The equivalence problem for regular expres

sions with squaring requires exp onential time Proc th IEEE Symp on

Switching and Automata Theory pp

MP Mumick ISPirahesh H Overb ound and rightlinear queries Proc th

ACM Symp on Principles of Database Systems pp

Mo Moschovakis YN Elementary Induction on Abstract Structures North

Holland

Naa Naughton JF Data indep endent recursion in deductive databases J

Computer and System Sciences pp

Nab Naughton JF Minimizing functionfree recursive denitions J ACM

pp

NRSU Naughton JF Ramakrishnan R Sagiv Y Ullman JD Ecient evalu

ation of right left and multilinear rules Proc ACMSIGMOD Intl Conf

on Management of Data pp

Ra Rabin MO Decidability of secondorder theories and automata on innite

trees Trans AMS pp

RS Rabin MO Scott D Finite automata and their decision problems IBM

J Research and Development pp

RSUV Ramakrishnan R Sagiv Y Ullman JD Vardi MY Logical query op

timization by pro oftree transformation J Computer and System Sciences

pp

SY SagivY Yannakakis M Equivalences among Relational Expressions with

the union and dierence op erators JACM pp

Sab Sagiv Y Optimizing Datalog programs In Foundations of Deductive

Databases and Logic Programming J Minker ed Morgan Kaufmann Pub

lishers pp

Se Seidl H Deciding equivalence of nite tree automata SIAM J Computing

pp

Shm Shmueli O Decidability and expressiveness asp ects of logic queries Proc

th ACM Symp on Principles of Database Systems pp

TW Thatcher JW Wright JB Generalized nite automata theory with an

application to a decision problem of secondorder logic Mathematical System

Theory pp

Ul Ullman JD Principles of Database and Know ledge Base Systems Vol

Computer Science Press

Ul Ullman JD Principles of Database and Know ledge Base Systems Vol

Computer Science Press

UV Ullman JD Van Gelder A Parallel complexity of logical query programs

Algorithmica pp

Va Vardi MY The complexity of relational query languages Proc th ACM

Symp on Theory of Computing San Francisco pp

Va Vardi MY Decidability and undecidability results for b oundedness of

linear recursive queries Proc th ACM Symp on Principles of Database

Systems pp

Va Vardi MY Automata theory for database theoreticians In Theoretical

Studies in Computer Science JD Ullman ed Academic Press pp

VW Vardi MY Wolp er P Automatatheoretic techniques for mo dal logic of

programs J Computer and System Sciences pp

Zl Zlo of M QuerybyExample Operations on the Transitive Closure IBM

Research Rep ort RC