Conjunctive Query Answering in the Description Logic EL Using a Relational Database System

Carsten Lutz David Toman Frank Wolter Universitat¨ Bremen, Germany University of Waterloo, Canada University of Liverpool, UK [email protected] [email protected] [email protected]

Abstract In this paper, we present a novel approach to using RDBMSs for CQ answering over DL ontologies that, in par- Conjunctive queries (CQ) are fundamental for ac- ticular, accommodates qualified existential restrictions. We cessing description logic (DL) knowledge bases. apply it to the EL family of DLs [Baader et al., 2008], whose We study CQ answering in (extensions of) the DL members are widely used as ontology languages for large- EL, which is popular for large-scale ontologies scale bio-medical ontologies such as SNOMED CT and (early and underlies the designated OWL2-EL profile of versions of) NCI. Our main result shows that CQ answering OWL2. Our main contribution is a novel approach dr in ELH⊥ , the extension of basic EL with the bottom con- to CQ answering that enables the use of standard re- cept, role inclusions, and domain and range restrictions, can lational database systems as the basis for query ex- be implemented using an RDBMS. This result is of particular ecution. We evaluate our approach using the IBM dr relevance as ELH⊥ can be viewed as the core of the desig- DB2 system, with encouraging results. nated OWL-EL profile of the upcoming OWL Version 2 on- tology language. We evaluate our approach using the IBM DB2 RDBMS and show that it scales to TBoxes with more 1 Introduction than 50.000 axioms where answer times typically range from One of the main applications of ontologies in computer sci- a fraction of a second to a few seconds. ence is in data access, where an ontology formalizes concep- The central idea of our approach is to incorporate the con- tual information about data that is stored in one or multiple sequences of the TBox T into the relational instance corre- data sources, and this information is used to derive answers to sponding to the given ABox A. To capture this formally, we queries over such sources. This general setup plays a central introduce the notion of combined first-order (FO) rewritabil- role in, e.g., ontology-based information integration and peer- ity. A DL enjoys combined FO rewritability if it is possible to to-peer data management. In these and similar applications, effectively rewrite (i) A and T into an FO structure (indepen- ∗ Description Logics (DLs) are popular ontology languages and dently of q) and (ii) q and (possibly) T into an FO query q conjunctive queries (CQs) are used as a fundamental querying (independently of A) such that query answers are preserved, ∗ tool. Hence, efficient and scalable approaches to CQ answer- i.e, the answer to q over the FO structure is the same as the ing over DL ontologies are of great interest. answer to q over A and T . The connection to RDBMSs then Calvanese et al. have argued that, in the short run, true relies on the well-known equivalence between FO structures scalability of conjunctive query answering over DL ontolo- and relational databases, and FO queries and SQL queries. gies can only be achieved by making use of standard re- The notion of combined FO rewritability generalizes the no- lational database management systems (RDBMSs) [2007b]. tion of FO reducibility, where the TBox is incorporated into Alas, this is not straightforward as RDBMSs are unaware the query q rather than into the ABox A while the ABox it- of TBoxes (the DL mechanism for storing conceptual infor- self is used as a relational instance without any modification mation) and adopt the closed-world semantics. In contrast, [Calvanese et al., 2007b]. Notable properties of our approach ABoxes (the DL mechanism for storing data) and the asso- include: ciated ontologies employ the open-world semantics. Existing 1. It applies to DLs for which data complexity of CQ an- dr approaches to overcome these differences have serious limita- swering is PTIME-complete, such as ELH⊥ . tions. For example, the approach of [Calvanese et al., 2007b] ELHdr applies only to DLs with data complexity of CQ answering in 2. For ⊥ , both rewriting steps can be carried out in LOGSPACE whereas for many DLs, this problem is complete polynomial time and produce only a polynomial blowup; for PTIME or co-NP. In particular, the above approach can- moreover, the query rewriting only depends on the input T not be directly used for DLs that admit qualified existential CQ and the role inclusions in (usually only few), but T restrictions, which play an important role in many ontologies. not on ’s concept inclusions (usually very many). This limitation is shared by the rule-based approach presented In contrast and to the best of our knowledge, all existing in [Wu et al., 2008]. approaches to (non-combined) FO reducibility generate a

2070 rewritten query of size mn, with m the number of symbols For I an interpretation, q = ϕ(v) a k-ary FO query, and in the query and the TBox and n the size of the query. a1,...,ak ∈ NI, we write I|= q[a1,...,ak] if I satis- I In addition, we analyze the limitations of our approach and fies q with vi assigned to ai , 1 ≤ i ≤ k.Acertain an- show that DLs with data complexity exceeding PTIME can- swer for a k-ary conjunctive query q and a knowledge base not enjoy polynomial combined FO rewritability, where the K is a tuple (a1,...,ak) such that a1,...,ak occur in K and expansion of the ABox due to rewriting (but not necessarily I|= q[a1,...,ak] for each model I of K. We use cert(q, K) of query rewriting) is bounded by a polynomial. We believe to denote the set of all certain answers for q and K. This that this is a significant result as combined FO rewritability defines the query answering problem studied in this paper: dr involving an exponential blowup of the data does not seem to given an ELH⊥ knowledge base K andaCQq, to compute be practical. In particular, the result implies that expressive cert(q, K). DLs such as ALC cannot enjoy polynomial FO rewritabil- Note that CQ answering generalizes instance checking,the EL dr ity. The same holds even for enriched with negated ABox problem of deciding whether K|= C(a), for C an ELH⊥ - assertions and negated query atoms. For the latter case, we concept: any such C can be easily unfolded into a conjunctive sketch an approach to query answering that involves only a query qC such that a ∈ cert(qC , K) iff K|= C(a). polynomial blowup of the ABox, but is incomplete (in a way In the remainder of this paper we use the unique name as- I I precisely characterized by an alternative semantics). sumption: a = b for all interpretations I and all a, b ∈ NI Due to space limitations, detailed proofs are given in the with a = b. It is not hard to see that this has no impact on cer- technical report [Lutz et al., 2009]. tain answers. We also assume, w..o.g., that (i) queries con- tain only individual names that occur in the KB against which 2 Preliminaries they are asked, (ii) TBoxes do not contain domain restric- tions, (iii) TBoxes contain exactly one range restriction per dr In ELH⊥ , concepts are built according to the syntax rule role name, (iv) if K|= r  s and ran(r)  C, ran(s)  D are in T , then C T D, and (v) there are no r, s ∈ NR with C ::= A ||⊥|C  D |∃r.C r = s, K|= r  s, and K|= s  r. Justifications for these assumptions are given in [Lutz et al., 2009]. where, here and in the remaining paper, A ranges over con- cept names taken from a countably infinite set NC, r ranges over role names taken from a countably infinite set NR, and 3 ABox Rewriting / Canonical Models C, D range over concepts. A TBox is a finite set of concept The rewriting of the ABox consists of an extension of the inclusions C  D, role inclusions r  s, domain restrictions ABox to a canonical model of the knowledge base. For the dom(r)  C ran(r)  C dr , and range restrictions . An ABox remainder of this section, we fix an ELH⊥ -KB K =(T , A). is a finite set of concept assertions A(a) and role assertions We use sub(T ) to denote the set of all subconcepts of con- r(a, b), where a, b range over a countably infinite set NI of cepts that occur in T , rol(T ) for the set of role names that individual names.Aknowledge base is a pair (T , A) with T occur in T , and Ind(A) for the set of individual names that A a TBox and an ABox. occur in A. We also use ranT (r) to denote the (unique) con- The semantics of concepts, inclusions, restrictions, and cept C with ran(r)  C ∈T, and set assertions is given as usual in terms of an interpretation I I I =(Δ, · ); we refer to [Baader et al., 2008] for details. ran(T ):={ranT (r) | r ∈ rol(T )} An interpretation I is a model of a TBox T (ABox A)ifit NIaux := {xC,D | C ∈ ran(T ) and D ∈ sub(T )}, satisfies all inclusions and restrictions in T (assertions in A). It is a model of a knowledge base K =(T , A) if it is a model assuming NI ∩NIaux = ∅. The canonical model IK of K is de- of T and A. A knowledge base that has a model is consis- fined in Figure 1. It is standard to show the following (similar tent. For a concept inclusion, role inclusion, or assertion α, to proofs in [Baader et al., 2005b; Lutz and Wolter, 2007]): we write K|= α if α is satisfied in all models of K. Proposition 1. If K is consistent, then IK is a model of K. Let NV be a countably infinite set of variables. Together, ΔIK the sets NV (of variables) and NI (of individual names) form Note that the cardinality of is only quadratic in the size K T the set NT of terms.Afirst-order (FO) query q is a first-order of , and linear if does not contain range restrictions. The I formula built from NT and the unary and binary predicates model K can be computed in polynomial time since sub- ELHdr from NC and NR. We write q = ϕ(v) to indicate that q is the sumption and instance checking in ⊥ can be decided in FO formula ϕ whose free variables are among the variables poly-time [Baader et al., 2008]. In fact, there is an easy rule- v = v1,...,vk. Variables in v are the answer variables of q based procedure for computing the canonical model and the and q is k-ary if there are k answer variables. A conjunctive rules can be implemented as standard database operations; query is an FO query q of the form ∃u.ψ(u,v), where ψ is see the workshop publication [Lutz et al., 2008] for details. a conjunction of concept atoms A(t) and role atoms r(t, t). Consistency of K can also be checked in polynomial time The variables in u are the quantified variables of q. We use [Baader et al., 2008]. IK can be used for instance check- var(q) to denote the set of all variables in u and v, qvar(q) for ing: it can be shown that K|= C(a) iff IK |= C(a), for dr the set of quantified variables, avar(q) for the answer vari- K consistent, C an ELH⊥ -concept, and a ∈ Ind(A). Un- ables, and term(q) for the terms in q. Abusing notation, we fortunately, an analogous statement for conjunctive queries, write α ∈ q if the concept or role atom α occurs in q. namely (a1,...,ak) ∈ cert(q, K) iff IK |= q[a1,...,qk],

2071 I I Δ K := Ind(A) NIaux and a K := a for all a ∈ Ind(A) I A K := {a ∈ Ind(A) |K|= A(a)}∪{xC,D ∈ NIaux |K|= C  D  A} rIK := {(a, b) ∈ Ind(A) × Ind(A) | s(a, b) ∈Aand K|= s  r}∪ {(a, xC,D) ∈ Ind(A) × NIaux |K|= ∃s.D(a), ranT (s)=C, and K|= s  r}∪   {(xC,D,xC,D ) ∈ NIaux × NIaux |K|= C  D ∃r.D , ranT (s)=C , and K|= s  r}

Figure 1: The canonical model IK. does not hold. This is due to two reasons, the first one being 4 Query Rewriting that IK may contain unnecessary elements: ∗ Our aim is to rewrite the original CQ q into an FO query qR U |= q[a ,...,a ] Ir |= q∗ [a ,...,a ] Example 2. Take K1 =(T1, A1) with T1 = {A  A} and such that K 1 k iff K R 1 k for A = {B(a)} q = ∃u.(B(v) ∧ A(u)) x ∈ all a1,...,ak ∈ Ind(A). By Proposition 4, we obtain the 1 , and 1 . Then ,A r I cert(q, K) I A A K1 I |= q [a] a ∈ cert(q , K ) desired answers by using K (the rewriting of ) and so K1 1 , but clearly 1 1 . ∗ as a relational database instance and replacing q with qR.We This deficiency of IK is easily repaired by restricting it to el- now formulate the main result of this paper. aIK a ∈ Ind(A) ements reachable from some with . Formally, Theorem 5. For every finite set of role inclusions R and k- Ind(A)I = {aI | a ∈ Ind(A)} we define for each interpre- ary CQ q, one can construct in polynomial time a k-ary FO tation I.Apath in I is a finite sequence d0r1d1 ···rndn, ∗ dr I I query qR such that for all ELH⊥ -KBs K =(T , A) with R n ≥ 0, where d0 ∈ Ind(A) and (di,di+1) ∈ r for all i+1 the set of role inclusions in T and all a1,...,ak ∈ Ind(A), i

2072 • Fork= is the set of variables v ∈ qvar(q) such that there v0 v1 v2 is no implicant of in([v]); r r r • Fork (I,ζ) pre(ζ) = ∅    H is the set of of pairs such that , v v v there is a prime implicant of in(ζ) that is not contained 3BB | 4BB | 5 BBr r || BBr r || in in(ζ), and I is the set of all prime implicants of in(ζ); BB || BB || B ~|| B ~|| • Cyc is the set of variables v ∈ qvar(q) such that there are v6 v7    r0(t0,t0),...,rm(tm,tm),...,rn(tn,tn) ∈ q, n, m ≥  0, with v ∼q ti for some i ≤ n, ti ∼q ti+1 for all i

2073 For rewriting ABoxes into canonical models, we have used Number of assertions in ABox of K a rule-based approach implemented via iterated querying; see Concept 100K 100K 100K 200K 200K 200K 400K 800K 1.6M [Lutz et al., 2008] for more details. As relational systems are Role 25K 50K 75K 40K 65K 90K 360K 1.5M 5.8M Ir not optimized to handle tens of thousands of relatively small Number of assertions in K relations, one for each concept and role name in the ontology, Concept 440K 440K 441K 683K 683K 684K 1.3M 2.6M 5.1M Ir Role 197K 237K 273K 323K 371K 414K 986K 2.7M 8.2M we have used only two relations to represent K: Query Execution Time in seconds acbox(conceptid,indid)and Q1 (2c1r) 0.19 0.19 0.20 0.23 0.25 0.24 0.27 0.46 0.59 arbox(roleid,domain-indid,range-indid), Q2 (3c2r) 0.23 0.22 0.23 0.52 0.25 0.56 0.33 0.42 0.69 Q3 (3c2r) 0.25 0.27 0.26 0.31 0.31 0.31 0.42 0.86 1.13 where conceptid and roleid are numerical iden- Q4 (4c3r) 0.24 0.23 0.23 0.25 0.26 0.25 0.31 0.42 1.44 tifiers for concept names and roles names, indid, Q5 (5c5r) 0.36 0.36 0.30 0.60 0.34 0.45 2.24 7.93 1281 domain-indid, and range-indid are numerical iden- tifiers for individuals from NI ∪ NIaux, acbox rep- resents concept memberships, and arbox role mem- Figure 3: Summary of Experimental Results. berships. Indexes were generated on the attributes conceptid,indid and on roleid,domain-indid the blowup of ABox rewriting (but not necessarily of query and roleid,range-indid, using B+trees. We distin- rewriting) is at most polynomial. Since ABoxes in realistic guish individuals from NI and NIaux by positive and nega- applications are large, combined FO rewritability that is not tive identifiers, and thus need not store Aux as a relation. polynomial in this sense does not seem to be of much use. As an example for this representation, take the query q = The following result gives a fundamental limitation of poly- Nerve(x)∧¬Aux(x), which translates into the SQL statement nomial combined FO rewritability. Note that ground CQ an- select indid from acbox swering, where a CQ may contain only individual names but where conceptid=141723 and indid > 0 not variables, is the decisional variant of CQ answering. Data sets (ABox instances) for our experiments were gener- Theorem 6. If the data complexity of ground CQ answering ated randomly. When generating concept assertions, we have in a DL L is not in PTIME, then L does not enjoy polynomial focused on most specific concept names, i.e., concept names combined FO rewritability. without any subsumees in the TBox. The generation of role Proof. We show the contrapositive. Assume that L en- assertions was guided by the domain and range restrictions in joys polynomial combined FO rewritability, i.e., there are NCI. The numbers of concept and role assertions in the initial effectively computable mappings δ that takes each L-TBox and rewritten ABoxes used in our experiments are reported in K =(T , A) to a first-order structure δ(K) and γ that takes Figure 3 in thousands/millions. Due to implementation par- each pair (q, T ), with q a k-ary conjunctive query and T an ticularities of existing DB systems, ABox rewriting took up L-TBox, to a first-order formula γ(q, T ) with k free variables to several hours. We point out that (i) this high time consump- such that δ can be computed in polynomial time, the size of tion is not inherent to our approach and (ii) ABox rewriting δ(K) is polynomial in the size of K for all K, and the follow- can be implemented as an offline task in typical applications ing condition holds: for all combined L-KBs K =(T , A), all such as online analytical processing (OLAP) and data ware- k-ary conjunctive queries q, and all tuples (a1,...,ak) ∈ NI, housing. K|= q[a1,...,ak] iff δ(K) |= γ(q, T )[a1,...,ak]. Figure 3 summarizes the running times (in seconds) for Then the data complexity of ground CQ answering in L is each of the test queries for varying sizes of data; for each in PTIME: to check whether K|= q with q ground, we can query we list the number of concept and role atoms in paren- compute δ(K) in polynomial time and γ(q, T ) in constant theses; the structure of the queries ranges from simple chains time (as the size of q and T is constant), and then use FO (Q1) and star queries (Q2,Q3,Q4) to queries with cycles in model checking to decide whether δ(K) |= γ(q, T ). The their bodies (Q5). We show only a few representative sam- latter can be done in LOGSPACE. J ples here. ALC SHIQ The experimental results can be interpreted as follows. For expressive DLs such as and , the data com- Firstly, the rewriting of moderately-sized ABoxes into the plexity of ground CQ answering is co-NP-hard, thus Theo- r rem 6 implies that these DLs do not enjoy polynomial com- canonical model IK is well within the storage and query ca- pabilities of existing relational technology. Secondly, query bined FO rewritability unless PTIME = co-NP. performance scales well even when the naive physical de- In many applications, it is natural to admit also negated sign described above is used. Thirdly, query rewriting does concept assertions in the ABox and to extend conjunctive not increase the query processing times; the performance of queries with negated concept atoms in order to query the rewritten queries is almost identical to the performance of the resulting ABoxes, see e.g. [Patel et al., 2007]. Let us call original queries over the completed ABox. ABoxes, queries, and KBs of this form literal. The result stated in Theorem 6 also applies to literal KBs and literal 6 Limitations / Negation queries. Since the data complexity of literal CQ answering EL EL ELHdr ⊥ Recall from Section 1 that a DL enjoys polynomial combined over literal -KBs (where is ⊥ without , role hi- FO rewritability if it has combined FO rewritability such that erarchies, and domain and range restrictions) is co-NP-hard [Schaerf, 1993], it follows that even this mild extension of EL 1This time is solely due to a large size of the result (60M tuples). does not enjoy polynomial combined FO rewritability.

2074 However, there is still a (pragmatic, yet formal) way to is polynomial but still considerable on large data sets. As fu- add negation to our approach. In the following, we sketch ture work, it might be interesting to reduce this blowup by an incomplete approach to literal CQ answering over literal incorporating the TBox partly into the data and partly into dr ELH⊥ -KBs. Its incompleteness can be precisely character- the query. We will also develop effective approaches to up- ized in terms of an epistemic semantics for negated concept date the canonical model/auxiliary data when assertions are atoms in literal CQs, inspired by [Calvanese et al., 2007a]. added to or deleted from the ABox. As a preliminary, we assume standard names, i.e., inter- I References pretation domains Δ have to be a subset of the (countably I infinite) set NI of individual names and a = a for all I and [Baader et al., 2005b] F. Baader, S. Brandt, and C. Lutz. Pushing dr the EL envelope. In Proc. of IJCAI05, pages 364–369. Profes- all a ∈ NI. For a literal ELH⊥ -KB K, an interpretation I, sional Book Center, 2005. and a literal CQ q = ∃u.ψ(u,v) with v = v1,...,vk, we set I|=e q[a1,...,ak] iff there exists a variable assignment π [Baader et al., 2008] F. Baader, S. Brandt, and C. Lutz. Pushing the EL with π(vi)=ai for 1 ≤ i ≤ k and envelope further. In In Proc. of OWLED08, 2008. [Calvanese et al., 2006] D. Calvanese, G. De Giacomo, D. Lembo, •I|=π α for all positive atoms α in ψ, M. Lenzerini, and R. Rosati. Data complexity of query answering •K|= ¬A(a) for all ¬A(a) in ψ with a ∈ NI, and in description logics. Proc. of KR06, pages 260–270, 2006.

•K|= ¬A(π(v)) for all ¬A(v) in ψ with v ∈ NV. [Calvanese et al., 2007a] D. Calvanese, G. De Giacomo, D. Lembo, K|= q[a ,...,a ] I|= q[a ,...,a ] M. Lenzerini, and R. Rosati. EQL-Lite: Effective first-order Now set e 1 k iff e 1 k for query processing in DLs. In Proc. of IJCAI07, pages 274–279. all models I of K with standard names. Thus, answers to AAAI press, 2007. negated atoms do not depend on a concrete interpretation I, [Calvanese et al., 2007b] D. Calvanese, G. De Giacomo, but only on deducibility. Epistemic semantics is sound: every D. Lembo, M. Lenzerini, and R. Rosati. Tractable reason- answer is also an answer under the standard semantics (but ing and efficient query answering in DLs: The DL-Lite family. not vice versa); it is also conservative: it yields the same an- J. of Automated Reasoning, 39(3):385–429, 2007. swers as standard semantics when either the KB or the query [Krisnadhi and Lutz, 2007] A. Krisnadhi and C. Lutz. Data com- does not contain negation. An example for which the epis- plexity in the EL family of DLs. In Proc. of LPAR07, volume temic semantics is different from the standard semantics is 4790 of LNCS, pages 333–347. Springer, 2007. given by A = {A(a), ¬A(a)}, T = {A ∃r., ∃r.B  [Krotzsch¨ et al., 2007] M. Krotzsch,¨ S. Rudolph, and P. Hitzler. A}, and q = ∃u.r(a, u) ∧¬B(u). Then K|= q but K |=e q. Conjunctive queries for a tractable fragment of OWL 1.1. In We give a simple (and poly-time) reduction of epistemic dr Proc. of ISWC07, volume 4825 of LNCS, pages 310–323. literal CQ answering over literal ELH⊥ -KBs to standard CQ Springer, 2007. dr answering over ELH⊥ -KBs. Via the rewritings presented in [Lutz and Wolter, 2007] C. Lutz and F. Wolter. Conservative ex- the main part of this paper, the reduction enables the use of tensions in the lightweight description logic EL.InProc. of dr RDBMSs also for the case of ELH⊥ with negation (under the CADE21, volume 4603 of LNAI, pages 84–99. Springer, 2007. epistemic semantics). [Lutz et al., 2008] C. Lutz, D. Toman, and F. Wolter Conjunctive dr Given a consistent literal ELH⊥ -KB K =(T , A) and a query answering in EL using a database system. In Proc. of literal query q, replace every literal ¬A in q with a fresh OWLED08, 2008. concept name A. Additionally, add A to T whenever [Lutz et al., 2009] C. Lutz, D. Toman, and F. Wolter Conjunctive T|= A ⊥, and A(a) to A for all a ∈ Ind(A) with query answering in the DL EL using an RDBMS Technical Re- K|= ¬A(a), and then remove all negated assertions from A. port, 2009. http://www.informatik.uni-bremen.de/∼clu/papers/     Call the result K =(T , A ) and q . [Patel et al., 2007] C. Patel, J. J. Cimino, J. Dolby, A. Fokoue, dr A. Kalyanpur, A. Kershenbaum, L. Ma, E. Schonberg, and Theorem 7. Let K be a consistent literal ELH⊥ -KB.  K. Srinivas. Matching patient records to clinical trials using on- Then K can be computed in polynomial time and K|=e q[a ,...,a ] K |= q[a ,...,a ] q tologies. In Proc. of ISWC07, volume 4825 of LNCS, pages 816– 1 k iff 1 k for all literal CQs and 829. Springer, 2007. all a1,...,ak ∈ Ind(A). [Rosati, 2007] Riccardo Rosati. On conjunctive query answering in EL.InProc. of DL07, volume 250 of CEUR-WS, 2007. 7 Conclusion [Schaerf, 1993] A. Schaerf. On the complexity of the instance checking problem in concept languages with existential quantifi- We have proposed a novel approach to CQ answering in cation. J. of Intelligent Information Systems, 2:265–278, 1993. DLs using RDBMSs. Unlike previous approaches, it can be used also for DLs for which the data complexity is between [Sioutos et al., 2006] N. Sioutos, S. de Coronado, M.W. Haber, F.W. Hartel, W.L. Shaiu, and L.W. Wright. NCI thesaurus: a LOGSPACE and PTIME. In particular, this includes DLs that semantic model integrating cancer-related clinical and molecular fully admit existential restrictions (both on the left- and right- information. J. of Biomedical Informatics, 40(1):30–43, 2006. hand side of concept inclusions; see [Calvanese et al., 2006]) and paves the way to using RDBMSs for CQ answering in the [Wu et al., 2008] Z. Wu, G. Eadon, S. Das, E. Chong, V. Kolovski, EL family of DLs. Our experiments exhibit a promising per- M. Annamalai, and J. Srinivasan. Implementing an inference en- gine for RDFS/OWL constructs and user-defined rules in oracle. formance even without a sophisticated physical design. One In Proc. of ICDE08, pages 1239–1248. IEEE, 2008. drawback of our approach is the blowup of the data, which

2075