Query Rewriting for Existential Rules with Compiled Preorder
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Query Rewriting for Existential Rules with Compiled Preorder Melanie´ Konig,¨ Michel Leclere,` Marie-Laure Mugnier University of Montpellier, Inria, CNRS Montpellier, France Abstract DLs by allowing for non-tree structures as well as unbounded predicate arity. We consider here the basic database queries, We address the issue of Ontology-Based Query An- i.e., conjunctive queries (CQs). swering (OBQA), which seeks to exploit knowl- edge expressed in ontologies when querying data. There are two main approaches to query answering in pres- Ontologies are represented in the framework of ex- ence of existential rules and Horn DLs. The first approach is istential rules (aka Datalog±). A commonly used related to forward chaining: it triggers the rules to build a fi- technique consists in rewriting queries into unions nite representation of inferred data such that the answers can of conjunctive queries (UCQs). However, the ob- be computed by evaluating the query against this representa- tained queries can be prohibitively large in prac- tion, e.g., [Cal`ı et al., 2008; Thomazo et al., 2012]. The sec- tice. A well-known source of combinatorial ex- ond approach, initiated by DL-Lite [Calvanese et al., 2005], plosion are very simple rules, typically expressing is related to backward chaining: it rewrites the query using taxonomies and relation signatures. We propose the rules such that the answers can be computed by evaluat- a rewriting technique, which consists in compil- ing the rewritten query against the data, e.g., [Baget et al., ing these rules into a preorder on atoms and em- 2009; Gottlob et al., 2011] on existential rules. Some tech- bedding this preorder into the rewriting process. niques combine both approaches [Kontchakov et al., 2011; This allows to compute compact rewritings that can Thomazo and Rudolph, 2014]. Each technique is applicable be considered as “pivotal” representations, in the to a specific existential rule fragment. sense that they can be evaluated by different kinds In this paper, we focus on the query rewriting approach, of database systems. The provided algorithm com- which has the advantage of being independent from the putes a sound, complete and minimal UCQ rewrit- data. This technique typically outputs a union of conjunctive ing, if one exists. Experiments show that this tech- queries (UCQ), with the aim of benefiting from optimized re- nique leads to substantial gains, in terms of size and lational database management systems (RDBMS). However, runtime, and scales on very large ontologies. We despite the good theoretical data complexity, first experiments also compare to other tools for OBQA with existen- exhibited a serious problem related to the size of the rewrit- tial rules and related lightweight description logics. ten query, e.g., [Rosati and Almatelli, 2010]. Indeed, the out- put query can be exponentially larger than the initial query, even with very simple ontologies, and even if the output is 1 Introduction extended to arbitrary first-order queries [Kikot et al., 2011; We address the issue of Ontology-Based Query Answering 2012]. This led to a significant amount of work on rewrit- (OBQA), which seeks to exploit knowledge expressed in on- ing into more compact queries, which may remain first- tologies when querying data. We consider the novel frame- order queries (hence expressible in SQL) or go beyond them work of existential rules, also called Datalog± [Baget et al., (like Datalog programs, e.g., [Gottlob and Schwentick, 2012; 2011; Cal`ı et al., 2012; Krotzsch¨ and Rudolph, 2011]. Exis- Stefanoni et al., 2012]). Nowadays, mature systems have tential rules are an exstension of function-free positive con- emerged for OWL 2 QL [Rodriguez-Muro et al., 2013; junctive rules that allows for existentially quantified variables Civili et al., 2013], as well as prototypes for the EL family in rule heads. These rules are able to assert the existence of and more expressive DLs, e.g., [Eiter et al., 2012]. Exis- unknown entities, a fundamental feature in an open-domain tential rules are more complex to process. Entailment with perspective, where it cannot be assumed that the description general existential rules is even undecidable (e.g., [Beeri and of data is complete. Interestingly, existential rules generalize Vardi, 1981]). However, expressive subclasses ensuring Horn description logics (DLs), in particular lightweight DLs the existence of a UCQ rewriting are known, such as linear used in the context of OBQA, such as DL-Lite [Calvanese rules, which generalize most DL-Lite dialects, the sticky fam- et al., 2005] and EL [Baader, 2003], which form the core of ily, classes satisfying conditions expressed on a graph of rule so-called tractable profiles of the Semantic Web ontological dependencies, and weakly recursive rules [Cal`ı et al., 2009; language OWL 2. They overcome some limitations of these 2012; Baget et al., 2011; Civili and Rosati, 2012]. 3106 A well-known source of combinatorial explosion are some swer to a CQ Q in a fact F is yes if F j= Q; it is well-known very simple rules that typically express taxonomies or rela- that F j= Q iff Q maps to F . tion signatures. These rules are at the core of any ontology. We propose a new technique to tame this source of complex- Definition 1 (Existential rule) An existential rule (or sim- ity. It relies on compiling these rules into a preorder on atoms ply rule hereafter) is a formula R = 8x8y(B[x; y] ! and embedding this preorder into the rewriting process. In- (9z H[y; z])) where B = body(R) and H = head(R) are tituitively, each atom “represents” all its “specializations”. conjunctions of atoms, resp. called the body and the head Hence, the query rewriting process avoids exploring many of R. The frontier of R is the set of variables vars(B) \ CQs and outputs a small UCQ. This UCQ can be seen as a vars(H) = y. The existential variables in R is the set of “pivotal” representation, in the sense that it can be evaluated variables vars(H) n vars(B) = z. by different kinds of systems: it can be passed to a Datalog In the following, we will omit quantifiers in rules as there engine or unfolded into a positive existential query (a UCQ is no ambiguity. E.g. p(x; y) ! p(y; z) denotes the formula or a more compact form of query) to be processed by an 8x8y(p(x; y) ! 9zp(y; z)). RDBMS; it can also be directly processed if data is stored in A knowledge base (KB) K = (F; R) is composed of a main memory and the appropriate homomorphism operation finite set of facts (which can be seen as a single fact) F and a is implemented. finite set of existential rules R. The CQ entailment problem First, we prove that the new rewriting operator is sound is the following: given a KB K = (F; R) and a CQ Q, does and complete, and provide an algorithm that, given any CQ F; R j= Q hold? This problem is known to be undecidable and any set of existential rules, effectively outputs a mini- in the general case [Beeri and Vardi, 1981]. It can be solved mal, sound and complete UCQ-rewriting, if one exists (which by computing a sound and complete rewritten query Q from may not be the case due to the indecidability of the prob- Q and R, provided that such a finite Q exists, then asking if lem). Since the computation of the preorder is independent F j= Q. Here, the target query Q is a union of CQs (UCQ), from any query, we can divide this algorithm into a compila- that we see as a set of CQs, called a rewriting set of Q. tion step performed offline, and a query rewriting step, which takes the preorder as input. Second, we report experiments, Definition 2 [Sound and Complete (rewriting) set of CQs] which show that this optimization leads to substantial gains, Let R be a set of existential rules, Q be a CQ, and Q be a in terms of size of the output and query rewriting runtime, set of CQs. Q is said to be sound w.r.t. Q and R if for any and is able to scale on very large ontologies. In the case when fact F , for all Qi 2 Q, if Qi maps to F then R;F j= Q. the desired output is a “regular” UCQ, it remains faster for Reciprocally, Q is said to be complete w.r.t. Q and R if for our rewriting tool to compute a pivotal UCQ and to unfold it, any fact F , if R;F j= Q then there is Qi 2 Q such that Qi than to directly compute a regular UCQ. Moreover, our pro- maps to F . totype behaves well compared to other UCQ-rewriting tools A set of rules R for which any query Q has a finite sound tailored for DL-Lite ontologies, even if it does not exploit the and complete set of rewritings (in other words, Q is rewritable specificities of these languages. into a UCQ), is called a finite unification set (fus) [Baget et al., 2009]. Note that restricting a complete rewriting set Q 2 Preliminaries to its most general elements preserves its completeness. A cover of Q is an inclusion-minimal subset Q0 of Q such that An atom is of the form p(t1; : : : ; tk) where p is a predicate each element Q of Q is more specific than an element Q0 of with arity k, and the ti are terms, i.e., variables or constants Q0 (i.e., in database terms: Q is contained in Q0).