On the Complexity of Database Queries Extended Abstract Christos H Papadimitriou Division of Computer Science U C BerkeleyBerkeley CA Mihalis Yannakakis Bell Lab oratories LucentTechnologies MurrayHillNJ Abstract We revisit the issue of the complexity of database queries in the light of the recent parametric renement of complexity theoryWe show that if the number of variables in the query or the query size is considered as a parameter the familiar query languages conjunctive p ositive rst order Datalog are classied at appropriate levels of the socalled W hierarchyofDowney and Fellows These results strongly suggest that the query size is inherently in the exp onent of the data complexityofany query evaluation algorithm with the implication b ecoming stronger as the expressibility of the query language increases On the p ositive side we show that this exp onential dep endence can b e avoided for the extension of acyclic queries with but not inequalities Intro duction The complexity of query languages has b een next to expressibility one of the main preo ccupations of database theory ever since the pap er by Chandra and Merlin twentyyears ago see for extensiveoverviews of the sub ject It has b een noted rather early that when considering the complexityofevaluating a query on an instance one has to distinguish b etween two kinds of complexity Data complexity is the complexityofevaluating a query on a database instance when the query is xed and we express the complexityas a function of the size of the database The other called combinedcomplexity considers b oth the query and the databse instance as input variables the combined complexityofa query language is typically one exp onential higher than data complexity Of the two data complexity is widely regarded as more meaningful and relevant to database research since A third kind expression complexity assumes that the database instance is xed and is rarely dierenti ated from the combined complexity the query is typically of a size that can b e pro ductively assumed to b e xed and is in any eventmuch smaller than a typical database For a broad range of imp ortant query languages relational laguages like conjunctive queries rstorder Datalog xp ointlogicaswell as constraint languages ie extensions with constraints such as arithmetic comparisons linear and p olynomial inequalities etc data complexity predicts that the query evaluation problem is p erfectly tractable the com plexity classes spanned by these query languages range from AC to Pwell within what is considered satisfactory in complexitytheory These tractability results are often quoted in the literature to suggest that the corresp onding computational problems are tractable well understo o d solved under control This implication is based on the thesis broadly accepted in the theory of algorithms that as a rule p olynomial algorithms that arise in practice are usually fast practical with tolerable constant co ecient and reasonable exp onents Is this conclusion justied in the context of database query pro cessing It seems to us that neither of the two notions of complexity is completely satisfactory On the one hand combined complexity is rather restrictive b ecause it treats queries and databases as part of the input the same wayeven though the size q of queries is typically orders of magnitude smaller than the size n of the database Indeed it is for this reason that the study of the complexity of query languages has mostly concentrated on data complexity However on the other hand p olynomial time in the context of data complexity means time q n and in fact the known algorithms that place the ab ovementioned languages in P have precisely such a running time Even though qn it is not reasonable to consider q q xed b ecause even for small values of q a running time of n hardly qualies as tractable esp ecially in view of the fact that n is typically huge What should the notion of complexity b e then What wewould liketohave is a running time in which n is not raised to a c power that dep ends on q ie the dep endence on n is of the form n where c is a constant indep endent of the query and hop efully very small Let us draw an analogy with the computeraided verication area The basic problem there is the mo del checking problem do es a given program P the mo del satisfy a de sired prop erty expressed in some sp ecication language suchasLTL prop ositional linear temp oral logic There have b een signicantadvances in recentyears in the development of algorithms and to ols in this area esp ecially for nitestate programs whichcover an im p ortant set of critical applications The mo del checking problem for nite state programs P and LTL sp ecications is PSPACEcomplete However usually sp ecications are rather small like queries and programs are quite large like databases Fortunately it turns out that the mo del checking problem for LTL sp ecication and program P can b e solved in time exp onential in jj and linear in jP j Can we hop e for such algorithms in the query evaluation of the imp ortant query languages such as the ones mentioned ab ove What are natural classes of queries that p ossess this typ e of algorithms Parametric complexityprovides the framework to examine these problems Wenowknow that there is a class of reasonably natural problems that do not fall into this mold parametric problems such as do es graph G have a clique of size k This problem likemany others like k it is currently solvable only by algorithms of complexity n Query evaluation problems lie ominously within the scop e of this category with query length b eing the obvious analog of k in the parametric clique problem ab ove Researchers in complexityhave recently develop ed a theory of limited nondeterminism and xedparameter tractability which seeks to make imp ortant distinctions along the lines suggested ab ove b etween problems b elowNP In particular parametric problems with input sayG k which are solvable in p olyno mial time when k is xed can b e sub divided into two broad categories Those for whichthe f k p olynomial is of the form n ie has k in the exp onent and those for whichitis c of the form g k n for some constant c It is of great interest to distinguish b etween these two categories and to develop rigorous to ols that classify problems with resp ect to them Downey and Fellows haveintro duced a sequence of complexity classes of parametric prob ell this imp ortant issue lems collectively called the W hierarchywhich capture reasonably w The classes of the W hierarchy are indexed bythenumb ers plustwo limiting classes WSAT and WP These classes are quite rich in complete problems the higher the W class the less likely that the problem has a p olynomial algorithm with time b ound of the c form g k n The main p oint of this pap er is that parametric complexity theory is a productive frame work for studying the complexity of query languages In particular our goal is to put the wellknown tractability results of the query languages mentioned ab ove under a dierent p ersp ective which renders them p erhaps less confusing and misleading In particular we prove that the parametric versions of the query evaluation problem for conjunctive queries p ositive queries rstorder queries and Datalog queries are hard for higher and higher levels of the W hierarchy Therefore it is likely that any algorithm for the corresp onding query languages must have the parameter inherently in the exp onent furthermore this likeliho o d increases measurably with the expressibility of the language We analyse the complexityfortwotyp es of parameters the query size q and the number of variables v that app ear in the query The latter parameter is motivated by recentwork of Vardi who studied the complexity of queries assuming that the number of variables v is xed while the size of the query can grow along with the database He found that this assumption brings the combined complexity closer to data complexity namely p olynomial time for the ab ove languages although the p olynomial nowhas v in the exp onentofn instead of q Our analysis for the two parameters yields generally similar results with some subtle dierences In the next section wegive the necessary denitions from the evolving eld of parametric complexity In Section wegive the necessary denitions for applying this theory to query problems In Section weprove our classication results FinallyinSectionweshow a parametric tractability result which generalizes the main tractabilityresultknown so far in database theory namely that acyclic queries can b e evaluated eciently even with resp ect to combined complexity Weshow that acyclic conjunctive queries extended with inequalities conjuncts of the form x y are parametrically tractable in that they can b e evaluated in time almost linear in the size of the database and the output and exp onential in the size of the query or the number of variables this exp onential dep endence on the parameter is unavoidable as the inequalities turn the combined complexity of the problem from p olynomial to NPcomplete Trying to extend this further to constraints leads however to parametric hardness Parametric Complexity Theory Weintro duce next the main concepts from the complexity theory of parametric problems Our denitions generally follow A parametric problem is a set L of pairs x k where x is a string and k an integer parameter A parametric problem is called xedparameter fp tractable if there is an algorithm A that determines whether x k L in time b ounded bya
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-