On the Equivalence of Recursive and Nonrecursive Datalog Programs

On the Equivalence of Recursive and Nonrecursive Datalog Programs y Sura jit Chaudhuri Moshe Y Vardi Database Tech Department Department of Computer Science Hewlett Packard Lab oratories Rice University Page Mill Road Houston TX Palo Alto CA Email vardicsriceedu Email chaudhurihplhpcom URL httpwwwcsriceedu vardi Abstract We study the problem of determining whether a given recursive Datalog program is equivalent to a given nonrecursive Datalog program Since nonrecursive Dat alog programs are equivalent to unions of conjunctive queries we study also the problem of determining whether a given recursive Datalog program is contained in a union of con junctive queries For this problem we prove doubly exp onential upp er and lower time b ounds For the equivalence problem we prove triply exp onential upp er and lower time b ounds Intro duction It has b een recognized for some time that rstorder database query languages are lacking in expressive p ower AU GM Zl Since then many higherorder query languages have b een investigated AV CH Ch CH Im Va A query language that has received considerable attention recently is Datalog the language of logic programs known also as Hornclause programs without function symb ols K Ul which is essentially a fragment of xp oint logic CH Mo See Ul for a detailed discussion of Datalog The gain in expressive p ower do es not however come for free evaluating Data log programs is harder than evaluating rstorder queries Va Recent works have addressed the problems of nding ecient evaluation metho ds for Datalog programs BR is a go o d survey on this topic and developing optimization techniques for Dat alog see MP Nab NRSU The techniques to optimize evaluation of queries Work done while this author was at Stanford University and was supp orted by ARO grant DAAL G NSF grant IRI Air Force grant AFOSR and a grant of IBM Corp y Work done while this author was at the IBM Almaden Research Center are often based on the ability to transform a query into an equivalent one that can b e evaluated more eciently RSUV Therefore determining equivalence of queries is one of the most fundamental optimization problems Naturally the problem of determining equivalence of Datalog programs has received attention Unfortunately Datalog program equivalence is undecidable Shm Since the source of the diculty in evaluating Datalog programs is their recursive nature the rst line of attack in trying to optimize such programs is to eliminate the recursion The following example is from Naa Example Consider the following Datalog program 1 buys X Y likes X Y buys X Y trendy X buys Z Y It can b e shown that is equivalent to the following nonrecursive program 1 buys X Y likes X Y buys X Y trendy X likes Z Y Consider on the other hand the following Datalog program 2 buys X Y likes X Y buys X Y knows X Z buys Z Y It can b e shown that is not equivalent to the following nonrecursive program 2 buys X Y likes X Y buys X Y knows X Z likes Z Y In fact is inherently recursive ie it is not equivalent to any nonrecursive program 2 Thus a problem of sp ecial interest is that of determining the equivalence of a given recursive Datalog program to a given nonrecursive program ie a Datalog program where the dep endency graph among the predicates is acyclic This problem is the main fo cus of this pap er Note that this problem is dierent from that of determining whether a given recursive Datalog program is equivalent to some nonrecursive program The latter problem called the boundedness problem is known to b e undecidable GMSV and has b een studied extensively see KA for a survey and HKMV HKMV for recent results A nonrecursive program can b e rewritten as a union of conjunctive queries Thus containment of a nonrecursive program in a recursive program can b e reduced to the containment of a conjunctive query in a recursive program The latter problem was shown to b e decidable in fact it is EXPTIMEcomplete CK CLM Sab Thus what was left op en is the other direction ie the problem of determining whether a recursive program is contained in a nonrecursive program We attack this problem by investigating the containment of recursive programs in unions of conjunctive queries Our main result is that containment of recursive programs in unions of conjunctive queries is decidable Therefore iIt follows that equivalence of two given programs is decidable when one is recursive and the other nonrecursive We rst prove that the decidability of the containment problem follows from a p ower ful general decidability result due to Courcelle Cou Unfortunately while Courcelles result yields the decidability of the containment problem it provides only nonelementary timeb ounds Cou The main b o dy of the pap er is dedicated to a detailed study of the computational complexity of containment and equivalence For upp er b ounds we use the automatontheoretic approach advo cated in Va The key idea is that a recursive program can b e viewed as an innite union of conjunctive queries These conjunctive queries can b e represented by pro of trees and the set of pro of trees corresp onding to a given recursive program can b e represented by a tree automaton This representation enables us to reduce containment of recursive programs in unions of conjunctive queries to containment of tree automata which is known to b e decidable in exp onential time Se The size of the tree automata obtained in the reduction is exp onential in the size of the input as a result we obtain a doubly exp onential time upp er b ound for containment in unions of conjunctive queries These b ounds turn out to b e optimal by a succinct enco ding of alternating exp onentialspace Turing machines we prove a matching doublyexp onential time lower b ound A case of sp ecial interest is that of linear programs ie programs in which each rule contains at most one recursive subgoal CK UV In this case the corresp onding set of pro of trees can b e represented by word automata for which containment is known to b e decidable in p olynomial space MS As a result we obtain an exp onential space upp er b ound for the containment problem for linear programs which is also matched by 1 a lower b ound We then note that expressing a nonrecursive program as a union of conjunctive queries may involve an exp onential blowup in size Thus our upp erb ound technique for con tainment yields a triplyexp onential time upp er b ound for containment in nonrecursive programs doublyexp onential space upp er b ound for linear programs We show that the succinctness of nonrecursive programs is inherent by proving a matching triply exp onential time lower b ound doublyexp onential space lower b ound for linear pro grams Finally we observe that these results also yield the same complexity b ounds for equivalence to nonrecursive programs Thus while equivalence to nonrecursive programs is decidable it is highly intractable We note that one has to b e careful in interpret ing lower b ounds for query containment While containment of conjunctive queries in recursive program is complete for EXPTIME CK CLM Sab this complexity 1 Our upp er b ounds follow also from van der Meydens results on recursively indenite databases Mey Theorem is simply the expression complexity of evaluation Datalog programs Va In fact if attention is restricted to programs of b ounded arity we get NPcompleteness instead of EXPTIMEcompleteness In contrast our lower b ounds here imply real intractability and they hold even for programs of b ounded arity Preliminaries Conjunctive Queries and Datalog A conjunctive query is a p ositive existential conjunctive rstorder formula ie the only prop ositional connective allowed is and the only quantier allowed is Without loss of generality we can assume that conjunctive queries are given as formulas x x 1 k of the form y y a a with free variables among x x where the 1 m 1 n 1 k a s are atomic formulas of the form pz z over the variables x x y y i 1 l 1 k 1 m For example the conjunctive query y E x y E y z is satised by all pairs hx z i such that there is a path of length b etween x and z The free variables are also called distinguished variables We distinguish b etween variables and occurrences of variables in a conjunctive query but we only consider o ccurrences of variables in the atomic formulas of the query For example the variables x and y have each two o ccurrences in y E x y E y z An o ccurrence of a distinguished variable in a conjunctive query is called a distinguished occurrence A union of conjunctive queries is a disjunction s x x i 1 k i=1 of conjunctive queries A union of conjunctive queries x x can b e applied to a database D The 1 k result D fa a jD j a a g 1 k 1 k is the set of k ary tuples that satisfy in D If has no distinguished variables then it is viewed as a Bo olean query the result is either the empty relation corresp onding to false or the relation containing the ary tuple corresp onding to true A Datalog program consists of a set of Horn rules A Horn rule consists of a single atom in the head of the rule and a conjunction of atoms in the b o dy where an atom is a formula of the form pz z where p is a predicate symb ol and z z are variables The 1 l 1 l predicates that o ccur in head of rules are called intensional IDB predicates The rest of the predicates are called extensional EDB predicates Let b e a Datalog program i Let Q D b e the collection of facts ab out an

Load more