Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning

Conjunctive Query Answering with OWL 2 QL

Stanislav Kikot and Roman Kontchakov and Michael Zakharyaschev Department of Computer Science and Information Systems Birkbeck, University of London, U.K. –kikot,roman,michael˝@dcs.bbk.ac.uk

Abstract tional constants and predicates. Optimised (but still expo- nential) pure rewritings to nonrecursive were sug- We present a novel rewriting technique for conjunctive gested by Rosati and Almatelli (2010) and Gottlob, Orsi, query answering over OWL 2 QL ontologies. In gen- and Pieris (2011). The combined approach of Kontchakov eral, the obtained rewritings are not necessarily cor- rect and can be of exponential size in the length of the et al. (2010) was developed for OWL 2 QL without role in- query. We argue, however, that in most, if not all, prac- clusions; it uses a simple polynomial rewriting over the data tical cases the rewritings are correct and of polynomial expanded by applying the ontology axioms and introducing size. Moreover, we prove some sufficient conditions, a small number of new individuals. imposed on queries and ontologies, that guarantee cor- The diversity of approaches to query rewriting prompts rectness and succinctness. We also support our claim by another question: what is the type/shape/size of rewritings experimental results. we should aim at to make OBDA with OWL 2 QL efficient? When trying to answer this question, we should bear in Introduction mind that (i) the OBDA paradigm relies on the proven ef- ficiency of RDBMSs, but (ii) database query answering is OWL 2 QL, one of three profiles of the Web Ontology Lan- not tractable in the size of queries (PSPACE-complete for guage OWL 2, was designed with the aim of supporting FO queries and NP-complete for CQs). High efficiency of ontology-based data access (OBDA). The key idea is that RDBMSs in practice appears to indicate only that answering data, ‘stored in a standard relational database management real-world queries over real-world databases turns out to be system (RDBMS), can be queried through an OWL 2 QL tractable. As rewritings can turn a standard query to some- ontology via a simple rewriting mechanism, i.e., by rewrit- thing ‘out of this world,’ a first rule of thumb could be as fol- ing the query into an SQL query that is then answered by lows: the rewritten query should look similar to the original the RDBMS, without any changes to the data’ (www.w3.org/ one. In this respect, as Gottlob and Schwentick (2011) re- TR/owl2-profiles). The rewritability property ensures, in par- mark, their polynomial rewriting is of rather ‘theoretical na- ticular, that the data complexity of answering queries over ture’ (it uses the extra constants, predicates and existentially OWL 2 QL ontologies matches the complexity of database 0 quantified variables to encode, by making nondeterminis- query answering, which is in AC . tic guesses, a relevant part of the chase, aka the canonical It has been observed, however, that the available ‘rewrit- model; see the discussion in Conclusions for more details). ing mechanisms’ for OWL 2 QL (Calvanese et al. 2007a; The aim of this paper is to investigate what causes ex- Perez-Urbina,´ Motik, and Horrocks 2009; Rosati and Al- ponentially long pure rewritings of CQs over OWL 2 QL matelli 2010; Chortaras, Trivela, and Stamou 2011; Got- ontologies and to check experimentally whether those ‘bad tlob, Orsi, and Pieris 2011) are actually not so ‘simple.’ guys’ occur in real-world queries and ontologies. As a result In fact, the rewritten queries are often too long to be exe- of this analysis, we suggest some short (polynomial-size) cuted by modern RDBMSs, and the question whether ‘short’ rewritings that cover most, if not all, practical cases. rewritings exist has attracted considerable attention over the We think of a CQ as a labelled directed multigraph. For last two years. For example, Kikot, Kontchakov, and Za- example, the query ‘find x , x , x for which kharyaschev (2011) showed that no polynomial algorithm 1 2 3 can construct a ‘pure rewriting’ of a conjunctive query (CQ) ∃y , y , y A (x ) ∧ A (x ) ∧ B (y ) ∧ T (x , y ) ∧ q over an OWL 2 QL ontology T . Here by a pure rewriting 1 2 3 1 1 3 3 3 3 1 1  we mean any first-order (FO) rewriting with the same sig- R(x1, x2) ∧ T (x2, x3) ∧ S(x3, y2) ∧ S(y3, y2) nature (predicates and constants) as q and T , possibly with equality. On the other hand, Gottlob and Schwentick (2011) holds’ can be represented as the graph gave a polynomial-time (‘impure’) rewriting using addi- A1 A3 B3 Copyright c 2012, Association for the Advancement of Artificial y T x R x T x S y S y Intelligence (www.aaai.org). All rights reserved. 1 1 2 3 2 3

275 To answer a CQ q(~x) over data A and ontology T , we seek of existential quantifiers. Our experiments with a number homomorphisms from q to the structure, called the canoni- of standard OWL 2 QL ontologies and queries demonstrate cal model and obtained by expanding A (extensional data) that the sufficient conditions always apply, the queries nor- with knowledge in T (intensional data). Thus, we have to mally contain very few pieces of type (B), if any, and more- find possible cuts of q into a number of pieces: some of over, these pieces are never in conflict. Thus, in practice the them—say, of type (A)—are mapped to individuals in A, rewritings qc and qp are both short and correct. while the others—of type (B)—may only have some of their Omitted proofs can be found in the full version of the pa- terms (‘roots’) in A, whereas the remaining part is implied per at www.dcs.bbk.ac.uk/˜kikot. by the knowledge in T . For example, the query above can be cut, for a suitable T , into 3 pieces of which only the middle OWL 2 QL one is of type (A), and the right piece has two roots x3, y3: The language of OWL 2 QL contains individual names ai, concept names A , and role names P (i ≥ 1). Roles R, ba- A1 A3 B3 i i sic concepts B and concepts C are defined by the grammar: y T x R x T x S y S y 1 1 2 3 2 3 − R ::= Pi | Pi , If q(~x) does not have existentially quantified variables then B ::= ⊥ | Ai | ∃R, the whole q forms the only possible cut of type (A), because C ::= B | ∃R.B the answer variables ~x must be mapped to individuals in A. If T does not contain role inclusion axioms then, as shown A TBox, T , is a finite set of inclusions of the form by Kontchakov et al. (2010; 2011), every root determines a B v C, R v R , unique piece of type (B). However, in general, q may have 1 2 exponentially many pieces of type (B). One can encode the B1 u B2 v ⊥,R1 u R2 v ⊥. intuition above as a pure positive existential rewriting qe, (Note that concepts of the form ∃R.B can only occur in the which is, roughly speaking, a disjunction over all possible right-hand side of concept inclusions in OWL 2 QL. An in- cuts of q, and so is exponential in |q| in the worst case. clusion B0 v ∃R.B can be regarded as an abbreviation for A slightly different approach to checking possible cuts of 0 − three inclusions: B v ∃RB, ∃RB v B and RB v R, q is to consider, for every edge in q, whether it belongs to where RB is a fresh role name.) An ABox, A, is a finite set an (A) piece, or generates a (B) piece itself, or lies inside of assertions of the form Ak(ai) and Pk(ai, aj). T and A the (B) piece generated by some other edge. This ‘local’ together constitute the knowledge base (KB) K = (T , A). view gives another rewriting, a conjunction of disjunctions The semantics for OWL 2 QL is defined in the usual way I I qc, which may still be exponential as the same edge may based on interpretations I = (∆ , · ); consult (Baader et generate exponentially many distinct (B) pieces. (Two (B) al. 2003) for details. The set of individual names in A pieces are different if their domains or roots are different.) will be denoted by ind(A). For concepts or roles E1 and Moreover, the new rewriting is not necessarily correct be- E2, we write E1 vT E2 if T |= E1 v E2; and we set 0 0 0 cause some (B) pieces may not be realised together in the [E] = {E | E vT E and E vT E}. canonical model; we call such pieces conflicting. A conjunctive query (CQ) q(~x) is a first-order formula We analyse—both theoretically and experimentally— ∃~yϕ(~x,~y), where ϕ is constructed, using ∧, from atoms conditions under which the number of (B) pieces is polyno- of the form Ak(t1) and Pk(t1, t2), where each ti is a term mial and they are not in conflict with each other. In particu- (an individual or a variable from ~x or ~y). Variables in ~x lar, we develop techniques to ensure that rewritings are not are called answer variables, and those in ~y bound vari- affected by conflicting pieces. For example, one simple— ables. A tuple ~a ⊆ ind(A) is a certain answer to q(~x) but impure—approach involves a single fresh constant to over K = (T , A) if I |= q[~a] for all models I of K; represent all intensional objects in the canonical model (la- in this case we write K |= q[~a]. To simplify notation, belled nulls in the chase), but no extra variables. we will often identify q with the set of its atoms and use We show that the number of (B) pieces generated by one P −(t, t0) ∈ q as a synonym of P (t0, t) ∈ q; term(q) is the edge in the query q is largely determined by a sophisticated set of terms in q. We call q tree-shaped if its primal graph interaction between role inclusions and inverse roles in the (term(q), {{t, t0} | R(t, t0) ∈ q}) is a tree. ontology T , which can produce canonical models with very complex intensional parts. We give a sufficient condition (on Remark 1 Although the official OWL 2 QL contains the q and T ) which guarantees that one edge in q can generate concept > (for the whole domain), we do not consider it at most one piece of type (B) (though the number of ways here as > makes OBDA with OWL 2 QL domain dependent this piece can be matched in the canonical model can still (Abiteboul, Hull, and Vianu 1995): take, for example, the be exponential). This leads to our shortest pure rewriting, query A(x) over the ontology {> v A}. (But we shall use > as an auxiliary symbol as it is convenient to regard ∃R qp, which can be constructed in polynomial time, but is cor- rect only if q and T satisfy the sufficient condition, which as an abbreviation for ∃R.>.) To simplify presentation, we can also be checked in polynomial time. Trivial examples omit data properties and (ir)reflexivity constraints for roles. where this condition holds are ontologies without inverse Query answering over OWL 2 QL KBs is based on the roles, ontologies without role inclusions and qualified exis- fact that, for any consistent KB K = (T , A), there is an in- tential quantification, or those without positive occurrences terpretation CK such that, for all CQs q(~x) and ~a ⊆ ind(A),

276 we have K |= q[~a] iff CK |= q[~a]. The interpretation CK, q and T . As argued in the introduction, we are interested in called the canonical model of K, can be constructed as fol- rewritings q0 that are (a) as short as possible (ideally, poly- lows. For each pair [R], [B] with ∃R.B in T (recall that nomial in |q| and |T |), (b) look similar to the original CQ q ∃R.> is another way of writing ∃R), we introduce a fresh (in particular, built from the same predicates and terms as q symbol w[RB] and call it the witness for ∃R.B. We write and T ), and (c) can be constructed in reasonable time. − K |= C(w[RB]) if ∃R vT C or B vT C. Define a gener- Without loss of generality, we will assume that the primal ating , ;, on the set of these witnesses together with graph of q is connected. (If this is not the case, we consider ind(A) by taking: each of the connected components separately.) To understand the ingredients required for such a rewrit- – a ; w if a ∈ ind(A), [R] and [B] are v -minimal [RB] T ing, suppose q(~x) = ∃~yϕ(~x,~y), K = (T , A) and C |=a ϕ, such that K |= ∃R.B(a) and there is no b ∈ ind(A) with K where a is an assignment of elements of ∆CK to the vari- K |= R(a, b) ∧ B(b); ables in ϕ under which a(x) ∈ ind(A) for all x ∈ ~x. 0 – w[R0B0] ; w[RB] if u ; w[R0B0], for some u, [R] and Consider an atom P (z, z ) ∈ q with bound variables z, 0 [B] are vT -minimal such that K |= ∃R.B(w[R0B0]) and z and assume that a(t) ∈ ind(A), for some t ∈ term(q). 0 − 0 it is not the case that R vT R and K |= B(u). The assignment a can send z and z to four different loca- tions in C : (A) a(z), a(z0) ∈ ind(A); (B) a(z) ∈ ind(A), If a ; w ; ··· ; w , n ≥ 0, then we say K [R1B1] [RnBn] a(z0) ∈/ ind(A); (B−) a(z) ∈/ ind(A), a(z0) ∈ ind(A); (O) that a generates the path aw ··· w . Denote by [R1B1] [RnBn] a(z), a(z0) ∈/ ind(A). Let us see how these alternatives can pathK(a) the set of paths generated by a, and by tail(π) the be reflected in our rewriting. In all of these cases, we need last element in π ∈ pathK(a). CK is defined by taking: formulas of the form

CK [ CK _ ∆ = pathK(a), a = a, for a ∈ ind(A), extP (x, y) = R(x, y),

a∈ind(A) RvT P

CK CK _ _ A = {π ∈ ∆ | K |= A(tail(π))}, extC (x) = A(x) ∨ ∃y R(x, y). C P K = {(a, b) ∈ ind(A) × ind(A) | K |= P (a, b)} ∪ AvT C ∃RvT C a 0 a 0 {(π, π · w[RB]) | tail(π) ; w[RB],R vT P } ∪ In case (A) we have CK |= P (z, z ) iff A |= extP (z, z ). − Case (B) is possible only if, for some concept ∃R.B, we {(π · w[RB], π) | tail(π) ; w[RB],R vT P }. have a(z) ; w[RB], R vT P and the atoms of q ‘linked to’ CK 0 ∆ \ ind(A) is called the tree part of CK. The following z can be mapped into the RB-subtree of CT . To illustrate, result is proved in a standard way (Kontchakov et al. 2010): consider an example. Theorem 2 For every OWL 2 QL KB K = (T , A), every Example 3 Let T = {A v ∃S−.B, B v ∃S.A0,A0 v ∃T } CQ q(~x) and every ~a ⊆ ind(A), K |= q[~a] iff CK |= q[~a]. and q(x) = {S(y, x),S(y, z),T (z, v)}. The answer vari- able x must be mapped by a to an ABox element. However, Given a TBox T , define a KB KT = (T , AT ) by taking y can be mapped either to an ABox element or to the point

AT = {R(aRB, bRB),B(bRB) | a(x) · w[S−B] provided that a(x) is an instance of A (and so − ∃R.B is a T -consistent concept in T}. of ∃S .B). In the latter case, we have two possibilities for mapping z: either to a(x)·w[S−B] ·w[SA0], in which case we

In other words, CKT is the disjoint union of the canoni- must further set a(v) = a(x) · w[S−B] · w[SA0] · w[T >], or to cal models for consistent (T , {R(aRB, bRB),B(bRB)}). To a(x), provided that a(x) is an instance of ∃T . In the picture simplify notation, we use CT in place of CKT . We ex- below, these two (partial) maps are denoted by f and g. tend the generating relation ; in CT by adding to it the − pair aRB ; bRB. The RB-subtree of CT has root aRB S B-subtree of CT and consists of the full subtree of CT with root bRB ex- tended with the edge (aRB, bRB). (Note that there may v v be other branches starting from a in C , for example, f RB T T T T if ∃R vT ∃S and R 6vT S.) A0 In the next section, we introduce the na¨ıve exponential z z rewriting of CQs over OWL 2 QL ontologies sketched in the f S S S introduction, and analyse whether it can be made shorter. B g y y q f g q Two Rewritings S S− S Let q(~x) be a CQ and T an OWL 2 QL ontology. Our task is x x 0 g to construct an FO query q (~x), using the predicates and in- f aS−B dividuals of q, T and =, such that, for any ABox A and any 0 ~a ⊆ ind(A), we have (T , A) |= q[~a] iff IA |= q [~a], where The observations made in Example 3 are formalised in 0 IA is the interpretation with domain ind(A) given by the our central definition. Given a pair (t, t ) of adjacent terms atoms in A. Such a query q0 is called a (pure) rewriting of in (the graph of) q, we define a tree witness for (t, t0) to be

277 a homomorphism f (with domain dom f) from the query If f and g are incompatible and neither dom f ⊆ dom g nor dom g ⊆ dom f, then we call f and g conflicting. In  0 0 qf = S(s, s ) ∈ q | s, s ∈ dom f ∪ Example 4, dom f ∩ dom g = {y2, y3}, and so f and g are  −1 conflicting. A(s) ∈ q | s ∈ dom f \ f (rf ) We are now in a position to formulate a rewriting based to the RB-subtree of CT , for some ∃R.B, with root rf = on the idea discussed above. Given a tree witness f ∈ Θ for 0 aRB such that the following conditions hold: (t, t ) with rf = aRB, we first define a tree-witness formula (t1) dom f is the smallest set containing t, t0 and such that twf for f by taking if s ∈ dom f \f −1(r ) and S(s, s0) ∈ q then s0 ∈ dom f, f ^ ^  −1 twf = ext∃R.B(t) ∧ (t = s) ∧ extA(s) . (t2) f(t) = rf and if s ∈ dom f \ f (rf ) then s is bound. −1 s∈f (rf ) A(s)∈q Note that this notion of tree witness is different from the one N used for the description logic DL-Litehorn by Kontchakov A (possibly empty) subset Ξ ⊆ Θ is called consistent if all et al. (2010), where the structure of the canonical models pairs of tree witnesses in Ξ are compatible. Now, assuming ensured uniqueness of a tree witness if it existed. In Exam- that ~y is a list of all bound variables in q(~x), we set ple 3, there are two tree witnesses, f and q, for (x, y). As we shall see below, there may be exponentially many of them. qe(~x) = detachedq ∨ Returning back to case (B), we can say now that there 0 _ ^ ^ ^ 0  must exist a tree witness f for (z, z ) such that a(z) satisfies ∃~y twf ∧ extA(s) ∧ extP (s, s ) , −1 − all A(s) ∈ q with s ∈ f (rf ). Case (B ) is symmetric, Ξ⊆Θ f∈Ξ A(s)∈q P (s,s0)∈q and in case (O) there must exist R(t, t0) ∈ q for which (B) Ξ consistent s/∈dom f {s,s0}6⊆dom f holds and P (z, z0) is ‘covered’ by a tree witness for (t, t0) for all f∈Ξ for all f∈Ξ (as we assumed that a(t) ∈ ind(A), for some t ∈ term(q)). where This analysis suggests the following idea. Given a CQ q and KB K = (T , A), we guess pairs of adjacent terms (t, t0) – detachedq = ⊥ if q has at least one answer variable; in q that will be mapped to edges of the tree part of CK start- otherwise detachedq is a disjunction of the sentences ing from ind(A) (as in case (B) above) and, for each such ∃x ext∃R.B(x) such that there is a homomorphism from pair (t, t0), we guess a tree witness for (t, t0). The part of q to the RB-subtree of CT . the query that is not covered by the chosen tree witnesses Clearly, qe is a positive existential formula built from the will be mapped to ind(A) (as in case (A)). The query repre- atoms occurring in q and T together with equality unifying senting these guesses is then evaluated over IA. If q has no certain terms; it contains the same variables and constants as answer variables, then we also have to take account of the the original CQ q. Moreover, if we consider the predicates case where the whole q is mapped into the tree part of CK. extE for concepts and roles as primitive, then qe is a union Unfortunately, this idea cannot be implemented in a straight- of conjunctive queries (UCQ), where each subquery can be forward way, as shown by the following example. thought of as the result of folding the respective pieces of q Example 4 Let K = (T , {A(a)}), where T = {A v ∃R, into the tree witness formulas. A v ∃R−}. Consider the query Theorem 5 For every ABox A and every ~a ⊆ ind(A), we q(x1, x4) = {R(x1, y2),R(y3, y2),R(y3, x4)} have (T , A) |= q[~a] iff IA |= qe[~a]. shown in the picture below alongside CK. We illustrate the rewriting qe by a simple example.

Example 6 Suppose that q(x) = {Ri(x, yi) | i ≤ n} and y2 x4 f T = {Ai v ∃Ri | i ≤ n}. Each pair (x, yi) gives rise to R R R one tree witness fi with twfi = Ai(x) ∨ ∃y Ri(x, y), and − W V V R R qe = N⊆[0,n] ∃~y ( i∈N twfi ∧ j∈ /N Rj(x, yj)). g A x1 y3 a CK As all other known pure rewritings for OWL 2 QL, qe is |q| 2 of exponential size: O((nT ,q + 1) · |T | · |q| ), where nT ,q is the maximum number of distinct tree witness for- We obviously have a tree witness f for (x1, y2) such that 0 −1 mulas twf for tree witnesses containing a pair (t, t ) of ad- dom f = {x1, y2, y3} and f (rf ) = {x1, y3}, and also a jacent terms in q. Two recent results may help to shed some tree witness g for (x4, y3) with dom g = {x4, y3, y2} and −1 light on whether this exponential blowup is unavoidable in g (rf ) = {x4, y2}. Although these tree witnesses cover OWL 2 QL. the whole query q, they are only ‘realised’ in CK under con- One of them shows that no polynomial-time algorithm flicting maps: f sends x1, y3 to a and y2 to a·w[R>], while g can construct pure rewritings for CQs over OWL 2 QL on- sends x4, y2 to a and y3 to a · w[R−>]; in fact, K 6|= q(a, a). tologies, unless P = NP (Kikot, Kontchakov, and Za- This example motivates the following definitions. Let Θ kharyaschev 2011, Theorem 2). The idea of the proof is Vm be the set of tree witnesses for q and T . We say that f, g ∈ as follows. First, we encode any CNF χ = j=1 Dj over −1 −1 Θ are compatible if dom f ∩ dom g ⊆ f (rf ) ∩ g (rg). propositional variables p1, . . . , pn as an OWL 2 QL TBox,

278 Tχ, containing the axioms, for 1 ≤ i ≤ n, 1 ≤ j ≤ m and A good illuminative example of a CQ with exponentially k = 0, 1, many tree witness formulas is q(y0) over the TBox Tχ con- χ − k k structed above for a CNF . One can show that the pairs Ai−1 v ∃P .Xi ,Xi v Ai,Cj v ∃P.Cj, j j (zi , zi−1) give rise to exponentially many different tree wit- 0 1 Xi v ∃P.Cj if ¬pi ∈ Dj,Xi v ∃P.Cj if pi ∈ Dj, ness formulas twf for a suitable χ. and consider the CQ y0 y1 y2 A0 A2 Vn z1 z1 z1 q(y0) = A0(y0) ∧ i=1 P (yi, yi−1) ∧ An(yn) ∧ 2 1 0 C1 Vm j Vn j j j  j=1 P (yn, z0) ∧ i=1 P (zi−1, zi ) ∧ Cj(zn) . C2 2 2 2 z2 z1 z0 0 Now, suppose q (y0) is a rewriting of q(y0) and Tχ that does not use any constants. Consider the ABox A = {A0(a)}. It To illustrate, consider the CQ q(y0) for n = m = 2 and sup- − 1 is not hard to see that (Tχ, A) |= q[a] iff χ is satisfiable. On pose that χ is such that the P X1 -subtree of CTχ contains 0 the other hand, checking whether IA |= q [a] can be done in the fragment as shown in the picture below. 0 polynomial time in |q | because the domain of IA is a sin- C2 C2 gleton. Thus, constructing the rewriting q0 must be at least P as hard as deciding satisfiability of χ. (See the Conclusions P 1 1 for a further discussion and results.) X1 X2

The argument above does not go through if databases are − − r0 P r1 P r2 assumed to have two special constants, say 0 and 1, which can be employed in rewritings q0. Indeed, as shown by Gott- P lob and Schwentick (2011), using 0 and 1, 6= and fresh predi- C2 P C2 cates of arity O(log(|q|·|T |)), one can construct a nonrecur- 0 1 1 sive Datalog rewriting q for any given CQ q and OWL 2 QL We can construct a tree witness f for (z1 , z0 ) by taking 0 1 1 ontology T in polynomial time. Intuitively, q uses the extra f(z1 ) = r0, f(z0 ) = r1, f(y2) = r2, after which we have 2 2 resources to encode a part of the canonical model, which three different options for defining f(z1 ) and f(z0 ): go back is enough to provide all certain answers to q, to guess a to r0, go to r1 and then take the C2-branch, or take the C2- map from q into this part and then check whether it is a ho- branch starting from r2. The last two tree witnesses give the momorphism. As argued in the introduction, RDBMSs do same tree witness formula, which is different from that given not appear to be best suitable for tasks of that sort, although by the first tree witness. Imagine now some large n and m. thorough experiments are required to confirm or refute this It is this fact that makes it ‘hard’ to construct a rewriting for claim. q(y0) and Tχ. Very often, however, there are other ways to construct The TBox Tχ and CQ q(y0) involve a very complex inter- polynomial rewritings. Let us assume for a moment that the play between role inclusions (or concepts of the form ∃R.C) following condition holds: and inverse roles, which appears to be rather artificial com- pared to how roles are used in real-world ontologies. The (conf) there are no conflicting tree witnesses in Θ. experiments to be reported later on in the paper demonstrate

In this case, the query qe can be transformed into the query that real-world ontologies and queries generate very few tree witnesses, which are never in conflict, and so the rewriting q is both short and correct. qc(~x) = detachedq ∨ c In the next section we analyse conflicting tree witnesses ^ h ^ ^ 0  _ i ∃~y extA(s) ∧ extR(t, t ) ∨ twf . in more detail. {t,t0} A(s)∈q R(t,t0)∈q f∈Θ 0 t and t0 s∈{t,t0} t,t ∈dom f adjacent Conflicting Tree Witnesses One simple way to tackle the problem of conflicting tree If q does not contain binary predicates then there cannot be witnesses, identified in Example 4, would be to introduce a any tree witnesses and we set qc = qe. fresh constant symbol, say ν, representing all the non-ABox elements of the canonical models. We then use the following Theorem 7 If T and q satisfy (conf) then, for any ABox A 0 variant of the tree witness formula twf for (t, t ): and ~a ⊆ ind(A), we have (T , A) |= q[~a] iff IA |= qc[~a]. 2 ν ^ The size of qc is O(nT ,q · |T | · |q| ). Thus, if every twf = twf ∧ (s = ν). 0 −1 pair (t, t ) of adjacent terms in q gives rise to polynomi- s ∈ dom f\f (rf ) ally many distinct tree-witness formulas twf and condition ν We denote the resulting ‘impure’ rewriting by qc . Given (conf) holds then qc is a polynomial rewriting of q and T . ν (If (conf) does not hold then q may return wrong answers.) an ABox A, denote by IA the interpretation IA extended c with a new domain element (interpreting) ν. Thus, ν is not Example 8 The exponential query q in Example 6 reduces involved in the interpretation of any predicate in IA, and so, V e to polynomial qc = ∃~y i≤n(Ri(x, yi) ∨ twfi ). of any predicate extE.

279 ν Example 9 In the context of Example 4, twg would contain Example 13 Given a word R1 ...Rm over roles, define the the conjuncts ext∃R−.>(x4) and (y2 = x4), which cannot CQ ν be satisfied in IA at the same time as the conjunct (y2 = ν) R1...Rm ν ν ν q (x0, xm) = {Ri(xi−1, xi) | 1 ≤ i ≤ m}. of twf . Thus, IA 6|= qc (a, a). ν Let σn be the following sequence of words of roles: The following result shows that qc is a correct rewriting over this interpretation. − − − σ1 = T1T1 , σn+1 = SnTn+1Tn+1Sn σnSn, for n ≥ 1. Theorem 10 For any ABox A and any ~a ⊆ ind(A), we have Now, consider the CQs q = qσn (x , x ) and TBoxes T = (T , A) |= q[~a] iff Iν |= qν [~a]. n 0 n n A c {An v ∃Rn,An+1 v ∃Sn.An}. It is not hard to see that ν The rewriting qc can be viewed as a step in the direction qn contains conflicting tree witnesses f and g as shown in of the combined approach (Lutz, Toman, and Wolter 2009; the picture below, the query qn−1 is a subquery of qg\f , and Kontchakov et al. 2010), though without expanding the + + so (qn−1)c occurs in (qn)c four times. On the other hand, ABox by applying the TBox axioms. Roughly, in both there is a constant c such that |qn| = |qn−1| + c. cases the RDBMS has to guess whether a bound variable is mapped to ind(A) or to the tree part of the canonical model. g f There is another way to ‘suppress conflicts’ in qc for the majority of practical cases, while keeping the resulting y2 y6 y9 + q rewriting ‘pure.’ For a CQ q, let qc be the rewriting ob- n−1 tained from qc by replacing every twf in it with the follow- T3 T3 T2 T2 T1 T1 + ing formula twf : S2 S2 S1 S1 S1 S2 + ^ + x0 y1 y3 y4 y5 y7 y8 y10 y11 x12 twf = twf ∧ (qg\f )c , f,g conflicting where qg\f denotes the restriction of the query q to the set −1 Sufficient Conditions for Polynomial dom g \ (dom f \ f (rf )) (i.e., all terms in g that are not + Rewriting non-root terms in f) and (qg\f )c , in turn, is the rewriting (as defined above) of the query q in which all the variables The rewriting qc is exponentially long if there are exponen- g\f 0 −1 −1 tially many tree witness formulas twf for some pair (t, t ) in f (rf ) ∪ g (rg) are regarded to be answer variables. of adjacent terms in q. Each twf is determined by the RB- Example 11 In the context of Example 4, the rewriting q+ c subtree of CT (containing the range of f), the domain dom f is constructed in two steps as follows: first, we obtain a rep- of f and the set f −1(r ); the terms in this set will be called + f resentation of qc as the following formula: the roots of f. Now we show that, for a large class of CQs q and OWL 2 QL TBoxes T , all tree witnesses for the same ∃y , y R(x , y ) ∨ tw+ ∧ 2 3 1 2 f pair (t, t0) in q have the same domain and roots. As the num- + + + R(y3, y2) ∨ twf ∨ twg ∧ R(y3, x4) ∨ twg , ber of distinct RB-subtrees of CT is polynomial in the size of T , the rewriting qc in this case is also polynomial; more- where over, we show that it can be constructed in polynomial time + + twf = ext∃R(x1) ∧ (x1 = y3) ∧ (R(y3, x4))c , in |q| and |T |. Roughly, Theorem 19 to be proved below + + demonstrates that this can be done if condition (conf) holds tw = ext − (x ) ∧ (x = y ) ∧ (R(x , y )) , g ∃R 4 4 2 1 2 c and there are no roles T , S and a tree witness f for which with R(y3, x4) and R(x1, y2) being two new queries, which f(q) and CT simultaneously contain fragments of the form have only answer variables. Then, the rewritings of these T two queries coincide with the queries themselves: R(y3, x4) S and R(x1, y2), respectively. So, for example, the modi- + T S T fied tree witness formula twf for f contains the conjunct f(q) R(y3, x4), which cannot be satisfied in the interpretation IA S with A = {A(a)}. CT Theorem 12 Suppose that q and T satisfy the condition Here, by f(q) we mean the quotient q / of q modulo (conf ) for every f ∈ Θ, if there are g, h ∈ Θ such that f ∼ f 1 the equivalence relation ∼ defined by taking s ∼ s0 iff the pairs f, g and f, h are both conflicting, then either 0 dom g ⊆ dom h or dom h ⊆ dom g. f(s) = f(s ). Roles T and S are called adjacent in f(q) if T (s1, s2),S(s2, s3) ∈ f(q), for some si ∈ term f(q) Then, for every ABox A and every ~a ⊆ ind(A), we have −1 + such that s2 ∈/ f (rf ). (T , A) |= q[~a] iff IA |= qc [~a]. We say that a role S is forward in T if u ; v for all + CT − We do not know any cases where qc is not correct. It is (u, v) ∈ S . If neither S nor its inverse S is forward then + to be noted, however, that the rewriting qc can be of expo- S is said to be a twisty role in T . nential size even if every pair (t, t0) in q gives rise to at most A tree witness f is called perfect if, for every pair T , S of one tree witness. adjacent roles in f(q) such that S is twisty in T , we have

280 0 (perf) CT 6|= inv(T,S) ∧ suc(T,S), Lemma 16 (i) If all tree witnesses for a pair (t, t ) of terms where in q and T are perfect, then there is a unique (up to isomor- phism) universal tree witness for (t, t0) in q and T . inv(T,S) = ∃x, y (T (x, y) ∧ S(y, x)), (ii) There is a polynomial-time algorithm which, given q, suc(T,S) = ∃x, y, z (T (x, y) ∧ S(y, z) ∧ (x 6= z)). T and a pair (t, t0) of adjacent terms in q, checks whether 0 − all the tree witnesses for (t, t ) in q and T are perfect, and Example 14 Consider T1 = {A v ∃T,T v S } and if this is the case, returns a universal tree witness for (t, t0) q(x) = {T (x, y),S(y, z)}. We have CT1 |= inv(T,S) and − − in q and T . CT1 |= inv(R,R ), for every role R. Both T and S are forward roles in T1. The homomorphism f shown below is As a consequence of this lemma we obtain that, if all tree a perfect tree witness for (x, y): witnesses for (t, t0) are perfect, then all of them share the same domain and roots. z Example 17 In Example 15, a universal tree witness for S (x, y) coincides with the whole query q(x). f In the proof of Lemma 16, we construct a finite sequence y f f f f i of tree-shaped CQs i such that the final n is a univer- sal tree witness for (t, t0). We begin by taking all the atoms T T − CT 0 q S 1 with terms t, t and then try to satisfy the tree witness condi- tion (t1) by adding atoms with new terms to the current f . x i f Condition (perf), which is being checked during the con- struction, allows us to decide whether we have to add a new − − 0 Consider now T2 = {A v ∃T,T v S , ∃T v ∃S.A }. term to f i or reuse an existing one. To illustrate, we give

As CT2 contains the following fragment one more example. − S S y2 T A0 R0 R R, R0 T − R, R0

q y1 y3 CT S is a twisty role in T2, and CT2 |= inv(T,S) ∧ suc(T,S). S, T Thus, there is no perfect tree witness for (x, y) in q and T2, S T S though there are two ‘imperfect’ tree witnesses. x0 y4 A B It turns out that if all tree witnesses f for a pair of terms in q are perfect then all of them have the same domain and Example 18 Consider T = {A v ∃S.C, C v ∃T −, − 0 0 0 0 roots, and so (i) the tree-witness formulas twf occur in the ∃S v ∃R,R v R ,B v ∃T ,T v T,T v S} 0 0 disjunctions for the same edges (t, t ) and (ii) the twf may and q(x0) = {S(x0, y1),R (y1, y2),R(y3, y2),T (y4, y3)} differ only in their ext∃R.B(t) conjuncts. In the following in the picture above. A universal tree witness for (x0, y1) 0 example, exponentially many different tree witnesses give must contain S(x0, y1). As the role in R (y1, y2) is forward, rise to the same tree witness formula. the universal tree witness must contain this atom too. The − role R in R (y2, y3) is also forward, so we add this atom Example 15 Let q(x) = {S(x, y),R(y, zi) | 1 ≤ i ≤ n} − − and identify y1 with y3. This gives three approximations of and T = {A v ∃S, ∃S v ∃R.B1, ∃S v ∃R.B2}. n the universal tree witness we are constructing: There are 2 (perfect) tree witnesses for (x, y), as each zi can be mapped either to a B1- or a B2-point in CT . All y2 y2 these tree witnesses have the same domain (all terms in q) 0 0 and only one root (x). They define the same tree-witness R R R f y1 y1 y1 y3 formula ext∃S(x). 1 f 2 f 3 The notion of universal tree witness we are about to in- S S S T troduce will serve as a compact representation for all such x0 x0 x0 y4 similar tree witnesses. 0 For a pair (t, t ) of adjacent terms in q, a tree-shaped CQ In T (y4, y3), the role T is twisty in T . The canonical model 0 f is called a universal tree witness for (t, t ) in q and T if CT suggests two alternative ways of treating T (y4, y3): ei- there is a partial homomorphism f from q onto f such that, ther to add it as it is, or to identify y4 with root x0. Thus, 0 for every tree witness g for (t, t ) in q and T , no universal tree witness for (x0, y1) exists. This is also sig- – dom g = term(f), and nalled by the fact that condition (perf) fails: by identifying y y S T suc(S, T ) 0 1 and 3, we make and adjacent, while both – there exists a forward homomorphism f from f to CT inv(S, T ) C S T 0 and hold in T . Note that and were not adja- such that g(s) = f (f(s)), for every s ∈ term(f), cent in the original query q; that is why (perf) is checked for where by a forward homomorphism with understand a ho- the image f(q) of every tree witness f. As the number of momorphism that preserves the distance from the root (re- tree witnesses can be exponential, this complication might call that both f and CT are tree-shaped). suggest that (perf) cannot be checked efficiently. The proof

281 of Lemma 16 shows that this check can be done in polyno- QuOnto Requiem Presto Nyaya tw/ ext non mial time without constructing all tree witnesses. size of size of size of size of utw rules ext UCQ UCQ Datalog UCQ rules Strictly speaking, a universal tree witness is not a tree wit- A Q1 783 402 69 249 8/1 53 3 ness in the sense of our original definition, but rather a con- Q2 1,812 103 52 94 1 30 3 venient structure representing all tree witnesses for (t, t0), Q3 4,763 104 55 104 0 30 1 even if there are exponentially many of them. Universal tree Q4 7,251 492 93 456 4/1 42 3 Q5 78,885 624 71 624 0 36 1 witnesses can be used to further simplify the rewriting qc. 0 U Q1 5 2 6 2 0 2 1 Namely, all formulas twf for (t, t ) are merged into a single Q2 287 148 1 1 0 0 1 tree-witness formula twf for the universal tree witness f for (t, t0) Q3 1,260 224 8 4 0 4 1 , which is defined by taking: Q4 5,364 1,628 6 2 0 2 1 h _ i ^ ^  Q5 9,245 2,960 11 10 0 7 1 twf = ext∃R.B(t) ∧ (t = s) ∧ A(s) . Q6 588 42 19 26 1 16 5 ∃R.B such that there is s is root of f A(s)∈q Q7 5,950 630 37 39 2 16 7 a homomorphism h : f→CT h(t)=aRB S Q1 6 6 7 6 0 6 1 0 h(t )=bRB Q2 204 160 3 2 0 2 1 Q3 1,194 480 5 4 0 4 1 0 As a universal tree witness is unique for each pair (t, t ) Q4 1,632 960 5 4 0 4 1 of adjacent terms, let us denote it by q(t,t0). We are now in a Q5 11,487 2,880 7 8 0 6 1 position to define our polynomial rewriting qp for q and T : Q6 588 564 12 36 1 24 3 Q7 118 106 14 29 1 19 5

qp(~x) = detachedq ∨ h i ^ ^ ^ 0  _ OWL 2 QL ontologies—Adolena (A), University (U) and ∃~y extA(s)∧ extR(t, t ) ∨ twq(s,s0) . {t,t0} A(s)∈q R(t,t0)∈q s and s0 adjacent Stockexchange (S)—using the same conjunctive queries as 0 0 0 t and t s∈{t,t } t,t ∈term(q(s,s0)) in (Perez-Urbina,´ Motik, and Horrocks 2009; Rosati and Al- adjacent matelli 2010; Gottlob, Orsi, and Pieris 2011) and two new Theorem 19 If all tree witnesses for q and T are perfect ones (Q6 and Q7); the ontologies and queries are available at and condition (conf) is satisfied then, for any ABox A and www.dcs.bbk.ac.uk/˜roman/query-rewriting. The results of the any ~a ⊆ ind(A), we have (T , A) |= q[~a] iff IA |= qp[~a]. experiments are collected in the table above. The most important conclusion we can draw from these Moreover, qp is constructed in time polynomial in |q|, |T |. experiments is that, in practice, queries and ontologies gen- Theorem 19 can be substantially sharpened by taking ac- erate very few tree witnesses, if any; they are never in con- count of the ‘types’ of terms in q. flict with each other, the sufficient condition of Theorem 19 Example 20 Consider again the TBox T2 from Example 14 is satisfied, and so the rewritings qc and qp are both short 00 and the CQ q1(x) = {T (x, y),S(y, z),A (z)}. This time and correct. we have only one tree witness for (x, y). It is not per- To describe how the rewritten queries look like and ex- fect; yet, it is type-perfect because the ‘typed’ suc formula plain the figures in the table, assume first that a given CQ ∃x, y, z T (x, y)∧S(y, z)∧A00(z)∧(x =6 z) does not hold q contains no tree witnesses with respect to a given ontol- in CT2 . ogy T . In this case, qp is obtained from q by replacing each unary atom A(s) with extA(s) and each binary atom If an ontology T does not contain any twisty roles, then 0 0 all tree witnesses in any CQ q over T are clearly perfect. R(s, s ) with extR(s, s ). Roughly, the extE contain those On the other hand, all examples of conflicting tree witnesses concepts/roles that are located under E in the classification given above involve twisty roles. The following theorem of T by an ontology reasoner. It will be convenient for us to shows that this is no accident: represent the extE as nonrecursive Datalog programs. For example, Adolena gives the rules: Theorem 21 Suppose that q is a CQ and T an OWL 2 QL ontology without twisty roles. Then there are no conflicting extaffects(x, y) :– affects(x, y). extaffects(x, y) :– isAffectedBy(y, x). tree witnesses for q and T . Thus, the rewriting qp for q and T is correct and can be constructed in polynomial time. extDevice(x) :– Device(x). It may be worth noting that OWL 2 EL (Baader, Brandt, ... and Lutz 2005; 2008) ontologies satisfy this sufficient con- The column ‘ext rules’ in the table refers to the number of dition, and so a polynomial rewriting similar to qp can also such Datalog rules for the given CQ and ontology. be used for conjunctive query answering over such ontolo- Consider now a query containing tree witnesses, for in- gies provided that the ABoxes are complete (or saturated) stance the following query Q4 over Adolena: with respect to the ontologies. q(x) = ∃y Device(x) ∧ assistsWith(x, y) ∧  Experiments PhysicalAbility(x, y) . This CQ contains a tree witness f with root To understand how the rewriting qp looks like in real- world practice, we have run experiments with three known f(x) = aassistsWith,MovementAbility.

282 In this case, for each pair (s, s0) of adjacent terms in q, we of individuals that belong to concepts and roles are defined introduce a fresh predicate, say edges,s0 , and define it by by means of queries (mappings) to a number of (relational) means of a nonrecursive Datalog program such as data sources (Lenzerini 2002; Calvanese et al. 2007b). Con- affects edge (x, y) :– ext (x), ext (x, y), sider, for instance, the rules for the role above. x,y Device assistsWith One can clearly expect mappings to be defined in such a ext (x, y). PhysicalAbility way that affects(a, b) ∈ A iff isAffectedBy(b, a) ∈ A edgex,y(x, y) :– extDevice(x), in every ABox A. This suggests that, in fact, there is no need ext∃assistsWith.MovementAbility(x). for two separate rules, so that the predicate extaffects can where the first rule corresponds to choosing both x and y be eliminated altogether (replaced by affects(x, y), which among the ABox individuals and the second rule comes halves the number of CQs produced). It is also quite fea- from a single universal tree witness for (x, y). Then we re- sible that information about all devices is stored in a single database relation and mappings for each of the 26 subclasses place by edges,s0 all the atoms in the CQ that are ‘covered’ by the terms s and s0. In our running example, we obtain of the concept Device select appropriate devices from the same database relation. In such a case, every ABox A de- q(x) :– edgex,y(x, y). fined by these mappings will be complete for all subclasses The column ‘non-ext rules’ in the table refers to the number A of Device in the sense that A(a) ∈ A iff (T , A) |= A(a). of such rules (one rule for the whole q plus the rules defining Therefore, with this information at hand, one can replace the 26 rules defining extDevice(x) with just a single rule the edges,s0 ); ‘tw’ gives the number of tree witnesses in the CQ, and ‘utw’ the number of universal tree witnesses. extDevice(x): −Device(x), or even eliminate the predi- In theory, a correct rewriting of a CQ q and an OWL 2 QL cate extDevice(x) altogether. ontology T can be exponential only if q and T give rise to exponentially many tree witnesses, in which case the canon- Conclusions ical model CT must be extremely complex. Our experiments In this paper, we considered pure (positive existential) indicate that, in practice, the contribution of tree witnesses rewritings of conjunctive queries over OWL 2 QL ontolo- does not look essential at all, especially in comparison with gies and analysed why such rewritings can be lengthy. We the contribution of the definitions of extE, which reflects the showed that the length of a rewriting is related to the num- depth and width of the concept and role hierarchies in T ber of tree witnesses in the query, which reflect how vari- rather than the complexity of CT . Note also that the same ous parts of the query can be homomorphically mapped to predicates extE are used in all queries, which makes these the tree (‘intensional’) part of the canonical model. Thus, a predicates an ideal target for optimisations. The ways to rewriting can be lengthy if the original query is sufficiently minimise the influence of these rules depend on how we long and the intensional part of the canonical model for the store the data. ontology is sufficiently complex. We proved that by restrict- There are two main approaches to storing data in OBDA. ing the interaction between inverse roles and role inclusion Suppose first that an ABox A is stored in a local database axioms in ontologies and queries, we can guarantee transpar- and the system has a certain degree of control over the data. ent polynomial rewritings. Moreover, we also demonstrated In this case, one can saturate A with the intensional data that real-world ontologies and queries contain very few tree that is implied by the TBox axioms (more precisely, con- witnesses, satisfy the above mentioned restrictions, and so struct the ABox part of the canonical model). Having done enjoy polynomial rewritings. so, we do not need the extE predicates any more and can Remark 22 When the final version of this paper was ready replace them with the corresponding E. The ABox satura- for submission, we obtained some new results that shed tion (more precisely, a finite encoding of the whole canoni- more light on the size of pure rewritings. Below is a brief cal model) was suggested in the combined approach (Lutz, summary of these results; for details consult the preliminary Toman, and Wolter 2009; Kontchakov et al. 2010). How- report (Kikot, Kontchakov, Podolskii and Zakharyaschev ever, the downside of the ABox saturation is a significant 2012). increase of the storage space required for the data (and a (1) An exponential blow-up is unavoidable for pure positive slowdown of updates). One solution to this problem was existential rewritings (PE) and pure nonrecursive Datalog found by Rodriguez Muro and Calvanese (2011a; 2011b). In (NDL) rewritings; pure FO-rewritings can blow-up super- a nutshell, the idea is to build a ‘semantic index’ by assign- polynomially unless NP ⊆ P/poly. ing numerical identifiers to the concept names, used in the (2) Pure NDL-rewritings are in general exponentially more ontology, in such a way that all subclasses of a given con- succinct than pure PE-rewritings. cept are associated with an interval (or a few intervals) of numbers. The semantic index allows one to encode the def- (3) Pure FO-rewritings can be superpolynomially more suc- cinct than pure PE-rewritings. initions of any of the extE predicates as a single query that selects all instances with concept identifiers falling into the (4) Impure PE-rewritings can always be made polynomial, respective interval(s). In this case, the classical database in- and so they are exponentially more succinct than pure PE- dexing techniques are employed to ensure efficiency of these rewritings. interval queries. (1)–(3) are proved by first establishing connections between In the other typical OBDA scenario, ABoxes do not come pure rewritings for CQs over OWL 2 QL ontologies and cir- as sets of triples stored in a single database. Instead, the sets cuits for monotone Boolean functions, and then using known

283 lower bounds and separation results for the circuit com- Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; plexity of such functions as CLIQUE(n, k) ‘a graph with Poggi, A.; and Rosati, R. 2007b. Ontology-based database n nodes contains a k-clique’ and MATCHING(2n) ‘a bi- access. In Proc. of the 15th Ital. Conf. on Database Systems, partite graph with n vertices in each part has a perfect SEBD 2007, 324–331. matching’ (Razborov 1985; Borodin, von zur Gathen, and Chortaras, A.; Trivela, D.; and Stamou, G. 2011. Goal- Hopcroft 1982; Raz and Wigderson 1992; Raz and McKen- oriented query rewriting for OWL 2 QL. In Proc. of the 24th zie 1997) Int. Workshop on Description Logics, DL 2011, vol. 745 of The polynomial PE-rewriting in (4) is similar to the NDL- CEUR Workshop Proceedings. CEUR-WS.org. rewriting of Gottlob and Schwentick (2011): using two Gottlob, G., and Schwentick, T. 2011. Rewriting ontological extra constants, = and (polynomially-many) new existen- queries into small nonrecursive datalog programs. In Proc. tially quantified variables, one can encode a relevant part of the 24th Int. Workshop on Description Logics, DL 2011, of the canonical model of T in the rewritten query. The vol. 745 of CEUR Workshop Proceedings. CEUR-WS.org. difference between the resulting impure PE-rewritings and the exponential-size pure PE-rewritings is of the same kind Gottlob, G.; Orsi, G.; and Pieris, A. 2011. Ontological as the difference between deterministic and nondetermin- queries: Rewriting and optimization. In Proc. of the the 27th istic Boolean circuits. As shown by Razborov (1985), no Int. Conf. on Data Engineering, ICDE 2011, 2–13. IEEE polynomial-size deterministic monotone circuit can com- Computer Society. pute CLIQUE(n, k); however, it can be computed by a Kikot, S.; Kontchakov, R.; and Zakharyaschev, M. 2011. polynomial-size nondeterministic circuit (or a QBF), where On (In)Tractability of OBDA with OWL 2 QL. In Proc. the existentially quantified variables guess k vertices and of the 24th Int. Workshop on Description Logics, DL 2011, the circuit checks whether they form a k-clique in the given vol. 745 of CEUR Workshop Proceedings. CEUR-WS.org. graph. In the polynomial impure PE-rewriting (4), the extra Kikot, S.; Kontchakov, R.; Podolskii, V.; and Zakhary- constants, variables and = are used to nondeterministically aschev, M. 2012. Exponential lower bounds and separation guess a part of the canonical model into which the query for query rewriting. CoRR, arXiv:1202.4193, 2012. can be mapped. (We conjecture that there does not exist an Kontchakov, R.; Lutz, C.; Toman, D.; Wolter, F.; and Za- impure polynomial-size rewriting if the number of new ex- kharyaschev, M. 2010. The combined approach to query istentially quantified variables is bounded.) answering in DL-Lite. In Proc. of the 12th Int. Conf. on Principles of Knowledge Representation and Reasoning, KR Acknowledgments 2010. AAAI Press. The work on this paper was supported by the U.K. EPSRC Kontchakov, R.; Lutz, C.; Toman, D.; Wolter, F.; and Za- grant EP/H05099X/1. kharyaschev, M. 2011. The combined approach to ontology- We are grateful to G. Gottlob, G. Orsi, R. Rosati and based data access. In Proc. of the 20th Int. Joint Conf. on Ar- D. Tsarkov who helped us to run the experiments. tificial Intelligence, IJCAI-2011, 2656–2661. AAAI Press. Lenzerini, M. 2002. : A theoretical perspec- References tive. In Proc. of the 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems, PODS 2002, 233– Abiteboul, S.; Hull, R.; and Vianu, V. 1995. Foundations of 246. Databases. Addison-Wesley. Lutz, C.; Toman, D.; and Wolter, F. 2009. Conjunctive Baader, F.; Calvanese, D.; McGuinness, D.; Nardi, D.; and query answering in the description logic EL using a rela- Patel-Schneider, P., eds. 2003. The Description Logic Hand- tional database system. In Proc. of the 21st Int. Joint Conf. book: Theory, Implementation and Applications. Cambridge on Artificial Intelligence, IJCAI 2009, 2070–2075. AAAI University Press. Press. Baader, F.; Brandt, S.; and Lutz, C. 2005. Pushing the EL Perez-Urbina,´ H.; Motik, B.; and Horrocks, I. 2009. A envelope. In Proc. of the 19th Int. Joint Conf. on Artificial comparison of query rewriting techniques for DL-Lite. In Intelligence, IJCAI-05, 364–369. Professional Book Center. Int. Workshop on Description Logics, DL 2009, vol. 477 of Baader, F.; Brandt, S.; and Lutz, C. 2008. Pushing the EL CEUR Workshop Proceedings. CEUR-WS.org. envelope further. In Clark, K., and Patel-Schneider, P. F., Raz, R., and McKenzie, P. 1997. Separation of the monotone eds., Proc. of the OWLED 2008 DC Workshop on OWL: Ex- nc hierarchy. In Proc. of the 38th Annual Symp. on Founda- periences and Directions. tions of Computer Science, FOCS’97, 234–243. IEEE Com- Borodin, A.; von zur Gathen, J.; and Hopcroft, J. E. 1982. puter Society. Fast parallel matrix and GCD computations. In Proc. of the Raz, R., and Wigderson, A. 1992. Monotone circuits for 23rd Annual Symp. on Foundations of Computer Science, matching require linear depth. J. ACM 39(3):736–744. FOCS’82, 65–71. IEEE Computer Society. Razborov, A. 1985. Lower bounds for the monotone com- Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.; plexity of some Boolean functions. Dokl. Akad. Nauk SSSR and Rosati, R. 2007a. Tractable reasoning and efficient 281(4):798–801. query answering in description logics: The DL-Lite family. Rodriguez Muro, M., and Calvanese, D. 2011a. Depen- J. of Automated Reasoning 39(3):385–429. dencies to optimize ontology based data access. In Proc.

284 of the 24th Int. Workshop on Description Logics, DL 2011, vol. 745 of CEUR Workshop Proceedings. CEUR-WS.org. Rodriguez Muro, M., and Calvanese, D. 2011b. Semantic index: Scalable query answering without forward chaining or exponential rewritings. In Proc. of the 10th Int. Semantic Web Conf., ISWC 2011. Rosati, R., and Almatelli, A. 2010. Improving query an- swering over DL-Lite ontologies. In Proc. of the 12th Int. Conf. on Principles of Knowledge Representation and Rea- soning, KR 2010. AAAI Press.

285