From SPARQL to Rules (And Back) ∗

WWW 2007 / Track: Semantic Web Session: Query Languages and DBs From SPARQL to Rules (and back) ∗ Axel Polleres Universidad Rey Juan Carlos Tulipán s/n, 28933 Móstoles, Madrid, Spain [email protected] ABSTRACT with defining aspects such as a formal semantics or layer- ing on top of OWL and RDFS. As for the second part, the As the data and ontology layers of the Semantic Web stack 1 have achieved a certain level of maturity in standard recom- RIF working group , who is responsible for the rules layer, mendations such as RDF and OWL, the current focus lies is just producing first concrete results. Besides aspects like on two related aspects. On the one hand, the definition of business rules exchange or reactive rules, deductive rules lan- a suitable query language for RDF, SPARQL, is close to guages on top of RDF and OWL are of special interest to recommendation status within the W3C. The establishment the RIF group. One such deductive rules language is Dat- of the rules layer on top of the existing stack on the other alog, which has been successfully applied in areas such as hand marks the next step to be taken, where languages with deductive databases and thus might be viewed as a query their roots in Logic Programming and Deductive Databases language itself. Let us briefly recap our starting points: are receiving considerable attention. The purpose of this Datalog and SQL. Analogies between Datalog and rela- paper is threefold. First, we discuss the formal semantics tional query languages such as SQL are well-known and - of SPARQL extending recent results in several ways. Sec- studied. Both formalisms cover UCQ (unions of conjunctive ond, we provide translations from SPARQL to Datalog with queries), where Datalog adds recursion, particularly unre- negation as failure. Third, we propose some useful and easy stricted recursion involving nonmonotonic negation (aka un- to implement extensions of SPARQL, based on this trans- stratified negation as failure). Still, SQL is often viewed to lation. As it turns out, the combination serves for direct be more powerful in several respects. On the one hand, the implementations of SPARQL on top of existing rules en- lack of recursion has been partly solved in the standard’s gines as well as a basis for more general rules and query 1999 version [20]. On the other hand, aggregates or ex- languages on top of RDF. ternal function calls are missing in pure Datalog. However, also developments on the Datalog side are evolving and with Categories and Subject Descriptors recent extensions of Datalog towards Answer Set Program- ming (ASP) – a logic programming paradigm extending and H.2.3 [Languages]: Query Languages; H.3.5 [Online In- building on top of Datalog – lots of these issues have been formation Services]: Web-based services solved, for instance by defining a declarative semantics for aggregates [9], external predicates [8]. General Terms The Semantic Web rules layer. Remarkably, logic pro- Languages, Standardization gramming dialects such as Datalog with nonmonotonic negation which are covered by Answer Set Programming are of- Keywords ten viewed as a natural basis for the Semantic Web rules layer [7]. Current ASP systems offer extensions for retriev- SPARQL, Datalog, Rules ing RDF data and querying OWL knowledge bases from the Web [8]. Particular concerns in the Semantic Web com- 1. INTRODUCTION munity exist with respect to adding rules including nonmonotonic negation [3] which involve a form of closed world After the data and ontology layers of the Semantic Web reasoning on top of RDF and OWL which both adopt an stack have achieved a certain level of maturity in standard open world assumption. Recent proposals for solving this recommendations such as RDF and OWL, the query and issue suggest a “safe” use of negation as failure over finite the rules layers seem to be the next building-blocks to be contexts only for the Web, also called scoped negation [17]. finalized. For the first part, SPARQL [18], W3C’s pro- posed query language, seems to be close to recommendation, The Semantic Web query layer – SPARQL. Since we though the Data Access working group is still struggling base our considerations in this paper on the assumption that similar correspondences as between SQL and Datalog can be ∗ An extended technical report of this article is available at established for SPARQL, we have to observe that SPARQL http://www.polleres.net/publications/. inherits a lot from SQL, but there also remain substantial Copyright is held by the International World Wide Web Conference Com- differences: On the one hand, SPARQL does not deal with mittee (IW3C2). Distribution of these papers is limited to classroom use, nested queries or recursion, a detail which is indeed surpris- and personal use by others. WWW 2007, May 8–12, 2007, Banff, Alberta, Canada. 1http://www.w3.org/2005/rules/wg ACM 978-1-59593-654-7/07/0005. 787 WWW 2007 / Track: Semantic Web Session: Query Languages and DBs ing by the fact that SPARQL is a graph query language We assume the pairwise disjoint, infinite sets I, B, L and on RDF where, typical recursive queries such as transitive V ar, which denote IRIs, Blank nodes, RDF literals, and closure of a property might seem very useful. Likewise, ag- variables respectively. In this paper, an RDF Graph is then gregation (such as count, average, etc.) of object values in a finite set, of triples from I ∪ B ∪ L × I × I ∪ B ∪ L,3 RDF triples which might appear useful have not yet been dereferenceable by an IRI. A SPARQL query is a quadru- included in the current standard. On the other hand, sub- ple Q = (V,P,DS,SM), where V is a result form, P is a tleties like blank nodes (aka bNodes), or optional graph pat- graph pattern, DS is a dataset, and SM is a set of solution terns, which are similar but (as we will see) different to outer modifiers. We refer to [18] for syntactical details and will joins in SQL or relational algebra, are not straightforwardly explain these in the following as far as necessary. In this translatable to Datalog. paper, we will ignore solution modifiers mostly, thus we will The goal of this paper is to shed light on the actual re- usually write queries as triples Q =(V,P,DS), and will use lation between declarative rules languages such as Datalog the syntax for graph patterns introduced below. and SPARQL, and by this also provide valuable input for the currently ongoing discussions on the Semantic Web rules Result Forms. Since we will, to a large extent, restrict our- layer, in particular its integration with SPARQL, taking the selves to SELECT queries, it is sufficient for our purposes to likely direction into account that LP style rules languages describe result forms by sets variables. Other result forms will play a significant role in this context. will be discussed in Sec. 5. For instance, let Q =(V,P,DS) Although the SPARQL specification does not seem 100% denote the query from Fig. 1, then V = {?X, ?Y }. Query stable at the current point, just having taken a step back results in SPARQL are given by partial, i.e. possibly in- from candidate recommendation to working draft, we think complete, substitutions of variables in V by RDF terms. In that it is not too early for this exercise, as we will gain valu- traditional relational query languages, such incompleteness able insights and positive side effects by our investigation. is usually expressed using null values. Using such null values More precisely, the contributions of the present work are: we will write solutions as tuples where the order of columns is determined by lexicographically ordering the variables in • We refine and extend a recent proposal to formalize the V . Given a set of variables V , let V denote the tuple ob- semantics of SPARQL from Pérez et al. [16], present- tained from lexicographically ordering V . ing three variants, namely c-joining, s-joining and b- The query from Fig. 1 with result form V = (?X, ?Y ) then joining semantics where the latter coincides with [16], has solution tuples (”Bob”, :a), (”Alice”, alice.org#me), and can thus be considered normative. We further (”Bob”, :c). We write substitutions in sqare brackets, so discuss how aspects such compositionality, or idempo- these tuples correspond to the substitutions [?X → ”Bob”, tency of joins are treated in these semantics. ?Y → : a], [?X → ”Alice”, ?Y → alice.org#me], and • Based on the three semantic variants, we provide trans- [?X → ”Bob”, ?Y → :c], respectively. lations from a large fragment of SPARQL queries to Datalog, which give rise to implementations of SPARQL Graph Patterns. We follow the recursive definition of graph on top of existing engines. patterns P from [16]: • We provide some straightforward extensions of SPARQL • a tuple (s,p,o) is a graph pattern where s, o ∈ I ∪ L ∪ such as a set difference operator MINUS, and nesting V ar and p ∈ I ∪ V ar.4 of ASK queries in FILTER expressions. • if P and P ′ are graph patterns then (P AND P ′), • Finally, we discuss an extension towards recursion by (P OPT P ′), (P UNION P ′), (P MINUS P ′) are graph CONSTRUCT allowing bNode-free- queries as part of patterns.5 the query dataset, which may be viewed as a light- weight, recursive rule language on top of of RDF. • if P is a graph pattern and i ∈ I∪V ar, then(GRAPH i P ) The remainder of this paper is structured as follows: In is a graph pattern. Sec. 2 we first overview SPARQL, discuss some issues in • if P is a graph pattern and R is a filter expression then the language (Sec.

Load more