Undefined 0 (0) 1 1 IOS Press

Ontop: Answering SPARQL Queries over Relational

Diego Calvanese a, Benjamin Cogrel a, Sarah Komla-Ebri a, Roman Kontchakov b, Davide Lanti a, Martin Rezk a, Mariano Rodriguez-Muro c, and Guohui Xiao a a Free University of Bozen-Bolzano {calvanese,bcogrel, sakomlaebri,dlanti,mrezk,xiao}@inf.unibz.it b Birkbeck, University of London [email protected] c IBM TJ Watson [email protected]

Abstract. In this paper we present Ontop, an open-source Ontology Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA that avoids materializing triples and that is implemented through query rewriting techniques, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.

Keywords: Ontop, OBDA, Databases, RDF, SPARQL, Ontologies, R2RML, OWL

1. Introduction vocabulary, models the domain, hides the structure of the data sources, and can enrich incomplete data with Over the past 20 years we have moved from a background knowledge. Then, queries are posed over world where most companies had one all-knowing this high-level conceptual view, and the users no longer self-contained central to a world where com- need an understanding of the data sources, the relation panies buy and sell their data, interact with several between them, or the encoding of the data. Queries are data sources, and analyze patterns and statistics com- translated by the OBDA system into queries over po- ing from all of them. The challenge is shifting from tentially very large (usually relational and federated) obtaining information to finding the right informa- data sources. The ontology is connected to the data tion. It has always been the case that information is sources through a declarative specification given in power but today attention rather than information be- terms of mappings that relate symbols in the ontol- comes the scarce resource, and those who can distin- ogy (classes and properties) to (SQL) views over data. guish valuable information from background clutter The W3C standard R2RML [12] was created with the gain power [16]. To separate the wheat from the chaff, goal of providing a language for the specification of the companies need a comprehensive understanding of mappings in the OBDA setting. The ontology together their data and the ability to cope with diversity in the with the mappings exposes a virtual RDF graph, which data. can be queried using SPARQL, the standard query lan- Since the mid 2000s, Ontology-Based Data Access guage in the community. These vir- (OBDA) has become a popular approach to tackling tual RDF graphs can be materialized, generating RDF this problem [27]. In OBDA, a conceptual layer is triples that can be used with RDF , or al- given in the form of an ontology that defines a shared ternatively they can be kept virtual and queried only

0000-0000/0-1900/$00.00 c 0 – IOS Press and the authors. All rights reserved 2 Ontop during query execution. The virtual approach avoids ment regime in SPARQL. The system is available as a the cost of materialization and can profit from more Protégé 4 plugin, a SPARQL endpoint through Sesame than 30 years’ maturity of relational systems (efficient Workbench, and a Java library supporting OWL API query answering, security, robust transaction support, and Sesame API. etc.). The structure of the paper is as follows. Section 2 To illustrate these concepts and the different notions presents a high-level overview of the architecture of in this paper, we will use the following running exam- Ontop. Section 3 describes the SPARQL query answer- ple. All the material needed to run this example in the ing techniques implemented in Ontop. Section 4 out- OBDA system Ontop (and a complementary tutorial) lines the applications of Ontop, in particular the Sta- can be found online1. toil and Siemens use cases in the context of the Op- tique project. Related SPARQL query answering sys- Example 1.1 (Hospital Database). We consider a hos- tems are surveyed in Section 5. Section 6 is a retro- pital database with a single table tbl_patient that spective on the development of Ontop over the past five contains information about lung cancer patients. The years. Finally, Section 7 concludes the paper. table has 4 attributes: the patient identifier (pid), his/her name, the type of cancer (tumor) and its stage. The lung cancer can be of two types: Non-Small Cell 2. Architecture of Ontop Lung Carcinoma (NSCLC) and Small Cell Lung Car- cinoma (SCLC), which are encoded in the table by a Ontop is an open-source3 OBDA system released boolean value type as follows: under the Apache license, developed at the Free Uni- – false for NSCLC and true for SCLC. versity of Bozen-Bolzano. The Ontop system exposes relational databases as virtual RDF graphs by linking The stage of the cancer is encoded by a positive integer the terms (classes and properties) in the ontology to value stage as follows: the data sources through mappings. This virtual RDF – 1–6 for NSCLC stages I, II, III, IIIa, IIIb and IV, graph can then be queried using SPARQL, by translat- – 7–8 for SCLC stages Limited and Extensive. ing the SPARQL queries into SQL queries over the re- lational databases. This translation process is transpar- Finally, our sample table contains the following data: ent to the user. pid name type stage The architecture of Ontop, which is illustrated in Fig. 1, can be divided in four different layers: (i) the 1 ’Mary’ false 4 inputs, i.e., the domain-specific artifacts such as the 2 ’John’ true 7 ontology, database, mappings, and queries; (ii) the core Suppose we need a simple piece of information from of the system in charge of query translation, optimiza- tion, and execution; (iii) the exposing standard this database: “Give me the names of patients with a Java interfaces to users of the system; and (iv) the tumor of stage IIIa”. Even this simple query in this applications that allow end-users to execute SPARQL tiny database already presents some challenges, since queries over databases. to create the query and to understand and analyze the We explore each of these components in turn. results we need to know how the information is en- coded in the data. In the following sections we describe 2.1. Inputs: Ontology, Mappings, Queries, and how to use the Ontop system to address this challenge Databases by enhancing the database with a semantic layer. In this paper we present the OBDA system Ontop2, To the best of our knowledge, Ontop is the first a mature open-source system, which is currently being OBDA system that supports all the W3C recom- used in a number of projects. Ontop supports all the mendations related to OBDA: OWL 2 QL, R2RML, W3C recommendations related to OBDA: OWL 2 QL, SPARQL, SWRL, and the OWL 2 QL entailment R2RML, SPARQL, SWRL, and the OWL 2 QL entail- regime in SPARQL4. In addition, it supports all

1https://github.com/ontop/ontop-examples/ 3http://github.com/ontop/ontop/ tree/master/swj-2015 4SWRL and the OWL 2 QL entailment regime are currently sup- 2http://ontop.inf.unibz.it/ ported experimentally. Ontop 3

Sesame Workbench & Optique Application Protege Layer SPARQL Endpoint Platform

API OWL-API Sesame Storage And Inference Layer (SAIL) API Layer

Ontop Ontop SPARQL Query Answering Engine (Quest) Core

OWL-API Sesame API R2RML API JDBC (OWL Parser) (SPARQL Parser)

OWL 2 QL R2RML Inputs Relational SPARQL Ontologies Mappings Databases Queries

Fig. 1. Architecture of the Ontop system the major commercial and open-source relational mapping language, which is easier to learn and use. databases. Ontop allows users to convert native mappings into R2RML mappings and vice-versa. Intuitively, a map- Ontology. Ontop allows for RDFS [6] and ping assertion consists of a source (an SQL query re- OWL 2 QL [24] as ontology languages. OWL 2 QL is trieving values from the database) and a target (defin- based on the DL-Lite family of lightweight description ing RDF triples with values from the source). logics [9,2], which guarantees that queries over the ontology can be rewritten into equivalent queries over Example 2.2. The ontology in Example 2.1 can be the databases. Recently Ontop has been extended to populated from the database in Example 1.1 by means support also a fragment of SWRL [43]. of the following mappings: :db1/{pid} rdf:type :Patient . Example 2.1. The following ontology captures the do- ← SELECT pid FROM tbl_patient main knowledge of our running example. It describes :db1/neoplasm/{pid} rdf:type :NSCLC . the concepts of cancer and cancer patient with the fol- ← SELECT pid, stage FROM tbl_patient lowing OWL axioms: WHERE type = false :NSCLC rdfs:subClassOf :LungCancer . :db1/neoplasm/{pid} rdf:type :SCLC . :SCLC rdfs:subClassOf :LungCancer . ← SELECT pid, stage FROM tbl_patient :LungCancer rdfs:subClassOf :Neoplasm . WHERE type = true :hasNeoplasm rdfs:domain :Patient . :db1/{pid} :hasName {name} . :hasNeoplasm rdfs:range :Neoplasm . ← SELECT pid, name FROM tbl_patient :hasName rdf:type owl:DatatypeProperty . :db1/{pid} :hasNeoplasm :db1/neoplasm/{pid} . :hasStage rdf:type owl:ObjectProperty . ← SELECT pid FROM tbl_patient In particular, classes :NSCLS and :SCLC are both :db1/neoplasm/{pid} :hasStage :stage-IIIa . ← SELECT pid FROM tbl_patient subclasses of :LungCancer (that is, they both are WHERE stage = 4 types of lung cancer), which in turn is a subclass of (To ease readability we use a slightly simplified :Neoplasm. The object property :hasNeoplasm has version of the Ontop native mappings syntax.) In this class :Patient as its domain and :Neoplasm as its example, IRIs like :hasStage and rdf:type repre- range (in other words, it relates patients to neoplasms). sent the constant components of the RDF triples. IRIs We also have a datatype property :hasName and an :db1/{pid} and :db1/neoplasm/{pid} are con- object property :hasStage. structed using values from the database: in both cases Mappings. Ontop supports two mapping languages: {pid} is the value of the attribute pid in the respec- the W3C RDB2RDF Mapping Language (R2RML), tive SQL query. Similarly, {name} is the literal whose which is a widely used standard; and the native Ontop value is taken from the attribute name in the SQL 4 Ontop query of the mapping. Note that there are individuals 2.3. API Layer that represent patients, :db1/{pid}, and individuals that represent their tumors, :db1/neoplasm/{pid}. To allow developers build systems using Ontop as a This allows for a better modeling of the domain and Java library, Ontop implements two widely used Java allows the user to query specific properties of the APIs, which are also available as Maven artifacts: tumor independently of the patient. – OWL API [15] is a reference implementation for Queries. Ontop supports essentially all features of creating, manipulating and serializing OWL on- SPARQL 1.0 and the OWL 2 QL entailment regime tologies. We extended the OWLReasoner inter- of SPARQL 1.1 [21]. Support for other features of face to support SPARQL query answering. SPARQL 1.1 (e.g., aggregates, property path queries – Sesame [7] is a de-facto standard framework and negation) is ongoing work. for processing RDF data. Ontop implements the Sesame Storage And Inference Layer (SAIL) API Example 2.3. Recall our information need in Exam- supporting inferencing and querying over rela- ple 1.1: the names of all the patients who have a neo- tional databases. plasm (tumor) at stage IIIa. This can be represented by the following SPARQL query: 2.4. Application Layer

Ontop is available through a simple command line SELECT ?name WHERE { ?p rdf:type :Patient . interface, but also through several applications access- ?p :hasName ?name . ing it via the above mentioned APIs. We describe three ?p :hasNeoplasm ?tumor . such applications, which we have been developing and ?tumor :hasStage :stage-IIIa .} maintaining together with Ontop over the past years. – Protégé. Ontop implements a Plugin for Pro- tégé 4 based on OWL API. The plugin provides a The query would return ’Mary’ on our sample graphical interface for various key functionalites re- database. Observe that the vocabulary is more domain- lated to OBDA: editing mappings, executing SPARQL oriented and independent of the encoding done in the queries, checking consistency of the ontology, auto- database, e.g., there is no need anymore to use the spe- matically bootstrapping ontologies and mappings from cific values that encode types or stages. the database, importing and exporting R2RML map- pings, materializing RDF triples, etc. Figure 2 shows Databases. Ontop supports many relational database two screenshots of the Ontop Protégé Plugin for cre- engines via JDBC. These include all major commer- ating mappings and answering SPARQL queries from cial relational databases (DB2, Oracle, and MS SQL the running examples. Server) and the most popular open-source databases – Sesame Workbench & SPARQL Endpoint. Sesame (PostgreSQL, MySQL, H2, and HSQL). In addition, OpenRDF Workbench is a web application for admin- Ontop can be used with federated databases (e.g., istrating Sesame repositories. We extended the Work- 5 Teiid or Exareme, formerly called ADP [42]) to sup- bench to create and manage Ontop repositories using port multiple data sources (e.g., relational databases, SAIL API. Such repositories can then be used as stan- XML, CSV, and Web Services). dard SPARQL endpoints. Figure 3 shows a screenshot of creating an Ontop repository in Sesame Workbench. 2.2. Ontop Core – Optique Platform. The Optique Platform com- plements Ontop by adding an intuitive visual query The core of Ontop is the SPARQL engine Quest, builder, tools for ontology and mapping management, which is in charge of rewriting SPARQL queries over a friendly query answering interface, and a database the virtual RDF graph into SQL queries over the rela- federation tool among other features [13]. Ontop is the tional database (see Section 3). core of the Optique Platform and is in charge of the query transformation module. The platform can query streaming data and exploit massive parallelism in the 5http://teiid.jboss.org/ backend whenever possible. Ontop 5

(a) Mapping Editor (b) SPARQL Query Answering

Fig. 2. Screenshots of Ontop Protégé Plugin

SPARQL Query Ontology Mappings

Classified Query Rewriter Reasoner Ontology

SPARQL to SQL Mapping- T -mapping Translator Optimiser ON-LINE OFF-LINE

Ontop

SQL query DB Integrity Constraints

Fig. 4. The Ontop workflow

the ontology into the mappings and generating the so- called T -mappings [34]. During the query execution (the online stage) Ontop transforms input SPARQL queries into optimized SQL queries exploiting the T - mappings and the database integrity constraints. We Fig. 3. Screenshot of Ontop Sesame Workbench now explain each of the two stages.

3. Answering SPARQL Queries 3.1. Off-line Stage: Ontology and Mapping Compilation Ontop answers end-user’s SPARQL queries by rewriting them into SQL queries and delegating their The off-line stage of Ontop processes the ontol- execution to the data sources. With this approach there ogy, mappings, and database integrity constraints. This is no need to apply rules to GBs of data to generate stage can be thought of as consisting of three phases: all the facts entailed by the ontology. The workflow ontology classification, T -mapping construction, and of Ontop can be divided into the off-line and online T -mapping optimization. In the implementation of stages and is illustrated in Fig. 4. The most critical Ontop, however, the last two phases are performed si- task during start-up (the off-line stage) is compiling multaneously. 6 Ontop

During the first phase, the ontology is loaded 3.2. Online Stage: Query Answering through OWL API and is classified using the built- in reasoner. The resulting complete hierarchy of prop- The online stage takes a SPARQL query and trans- erties and classes is stored in memory as a directed lates it into SQL by using the T -mappings. We focus acyclic graph (DAG). For example, in the ontology only on the translation of SELECT queries (ASK and in Example 2.1, both :NSCLC and :SCLC are sub- DESCRIBE queries are treated analogously). In this classes of :LungCancer, which in turn is a subclass process Ontop also optimizes the SQL query by ap- of :Neoplasm. It follows that every NSCLC and every plying the Semantic Query Optimization (SQO) tech- SCLC is a form of Neoplasm: niques [11,20]. To ease the presentation we distinguish three phases in the query answering process: :NSCLC rdfs:subClassOf :Neoplasm . :SCLC rdfs:subClassOf :Neoplasm . (a) The SPARQL query is translated into SQL using The classification algorithm is based on a variant of T -mappings. graph reachability for the constructed DAG [30] (a (b) The resulting SQL query is optimized for efficient similar algorithm was later described in [23]). execution by the database engine. During the second phase, T -mappings are con- (c) The optimized SQL query is then executed by the structed by composing the class and property hi- database engine, and the result set is translated into erarchies with the mappings [34,36]. For example, the answer to the original SPARQL query by cre- consider concept :Neoplasm in Example 2.1. Al- ating necessary RDF terms. though it has no mappings defined by the user, the two Phases (a) and (b) are handled together in the imple- inclusions derived above give rise to the following mentation of Ontop (we, however, distinguish them rules in the T -mapping: here for the sake of clarity). :db1/neoplasm/{pid} rdf:type :Neoplasm . We elaborate now on the three phases of query an- ← SELECT pid, stage FROM tbl_patient swering. WHERE type = false 3.2.1. From SPARQL to SQL :db1/neoplasm/{pid} rdf:type :Neoplasm . Ontop internally represents the SPARQL query as a ← SELECT pid, stage FROM tbl_patient WHERE type = true tree corresponding to the SPARQL algebra expression (generated by the Sesame SPARQL parser). Each node Finally, during the third phase, the T -mappings of the tree is transformed into the respective SQL ex- are optimized by using disjunction (OR) and inter- pression. To illustrate the transformation, we continue val expressions in SQL and by applying Semantic with the running example. Query Optimization (SQO) techniques (which will be described in Section 3.2.2). For instance, using Example 3.1. Consider the fragment of the query in disjunction, Ontop transforms the two rules above into Example 2.3 that retrieves all tumors of stage IIIa: the following single rule: SELECT ?tumor WHERE { :db1/neoplasm/{pid} rdf:type :Neoplasm . ?tumor rdf:type :Neoplasm ; ← SELECT pid, stage FROM tbl_patient :hasStage :stage-IIIa . } WHERE type = false OR type = true Such optimizations are known to be relatively expen- (We note that the triple pattern ?tumor rdf:type sive (for example, SQO is based on an NP-complete :Neoplasm was not needed in Example 2.3: indeed, conjunctive query containment check) but are per- Ontop can infer it from ?p :hasNeoplasm ?tumor formed only once, during the off-line stage of Ontop, because the range of :hasNeoplasm is :Neoplasm.) and therefore have no negative effect on query pro- The above query is represented by the following tree: cessing. On the other hand, the resulting T -mappings PROJECT produce all the RDF triples that can be inferred from the database and the ontology and so, during the JOIN online stage, the T -mappings are used as the basis for the translation of individual triple patterns in SPARQL T1: ?x rdf:type Neoplasm . T2: ?x :hasStage :IIIa . queries into SQL. Ontop 7

Input: SPARQL Query Q, T -mappings MT Output: SQL Expression SELECT Q1.x FROM 1: S = List of nodes in Q in a bottom-up topological order ((SELECT concat(":db1/neoplasm/", pid) AS x 2: sqlM = a map from nodes to SQL expressions FROM tbl_patient 3: for node n ∈ S do WHERE type = falseOR type = true) Q1 JOIN 4: if n is triple pattern then . Translating leaves (SELECT concat(":db1/neoplasm/", pid) AS x 5: sqlM[n] = replace-Tmap-def(n, M ) T FROM tbl_patient WHERE stage = 4) Q2 6: else . Translating non-leaf nodes ON Q1.x = Q2.x) 7: if n = JOIN(n1,n2) then 8: sqlM[n] = InnerJoin(sqlM[n1],sqlM[n2]) (a) Non-optimized generated SQL query 9: else if n = OPTIONAL(n1, n2, e) then 10: sqlM[n] = LeftJoin(sqlM[n1],sqlM[n2],e) SELECT concat(":db1/neoplasm/", Q.pid) AS x 11: else if n = UNION(n1, n2) then FROM 12: sqlM[n] = Union(sqlM[n1],sqlM[n2]) (SELECT T1.pid 13: else if n = FILTER(n1, e) then FROM tbl_patient T1 JOIN tbl_patient T2 14: sqlM[n] = Filter(sqlM[n1], e) ON T1.pid = T2.pid 15: else if n = PROJECT(n1, p) then WHERE (T1.type = falseOR T1.type = true) AND T2.stage = 4) Q 16: sqlM[n] = Project(sqlM[n1], p) 17: end if (b) SQL query after the structural optimization 18: end if 19: end for 20: return sqlM[S.last()] SELECT concat(":db1/neoplasm/", Q.pid) AS x FROM (SELECT pid Algorithm 1. Translating SPARQL into SQL FROM tbl_patient WHERE (type = falseOR type = true) AND stage = 4) Q Next we explain how to produce the SQL expres- sion from a SPARQL query using T -mappings. Algo- (c) SQL query after the self-join elimination rithm 1 is a simplified version of the process. It iter- ates over the nodes of the SPARQL algebra tree in a SELECT concat(":db1/neoplasm/", pid) AS x FROM tbl_patient bottom-up fashion; more precisely, it goes through the WHERE (type = falseOR type = true) list S in the topological sorting order. In our running AND stage = 4 example this list is [T1,T2, JOIN, PROJECT]. So, the (d) SQL query after the second structural optimization algorithm starts by replacing each leaf of the tree, that is, a triple pattern of the form (s, p, o), by the union of Fig. 5. Example of SQL translation and optimization the SQL queries defining its predicate (lines 4–5) in the T -mapping. In this step, the algorithm implicitly con- Project siders two cases: (1) when p is an object or data prop- InnerJoin erty, such as :hasStage or :hasName or (2) when p rdf:type :Patient is and o is a class, such as . Q1 Q2 Once it finishes processing the leaves, it continues to the upper levels in the tree (lines 7–17), where the The leaves, Q1 and Q2, are respectively the SQL defi- SPARQL operators (PROJECT, JOIN, OPTIONAL, nitions of the concept :Neoplasm and of the property UNION, and FILTER) are translated into the corre- :hasStage in the T -mapping rules constructed dur- sponding SQL operators (Project, InnerJoin, LeftJoin, ing the off-line stage (see Section 3.1). Observe that Union, and Filter, respectively). Once the root is trans- without the T -mapping optimizations in the off-line lated the process is finished and the resulting SQL ex- stage, the resulting SQL would contain a union in place pression is returned. of Q1, increasing the complexity of the SQL query and so, having a negative effect on the query evaluation Example 3.2. Ontop translates the SPARQL query in time. Example 3.1 into a SQL query (shown in Fig. 5a) with For the sake of simplicity we do not describe the the following structure: translation of filter expressions and OPTIONALs (an 8 Ontop optimal translation of unions and empty expressions in 3.2.3. Executing the Query over the Database the second argument is particularly challenging) and Since different database engines support slightly how to handle data types and functions in SQL expres- different SQL dialects, we have to adjust the SQL syn- sions. Instead, we refer the interested reader to [38,21]. tax accordingly. For instance, the operator for string concatenation is || in Postgres and concat in MySQL; 3.2.2. Optimizing the generated SQL queries in MySQL, one cannot cast a value to Integer, so we The generated SQL queries can already be executed cast the value to Signed instead; Postgres internally by the DB engine but they are not necessarily efficient: changes unquoted identifiers (both column and alias they often contain subqueries, redundant self-joins and names) to lowercase, while Oracle and H2 change un- joins over complex expressions (such as string con- quoted identifiers to uppercase6. catenations). Observe that joins over complex expres- Finally Ontop sends the optimized SQL query to sions, for instance, prevent the database engine from the database engine and translates the result into RDF using indexes. To improve performance, Ontop em- terms (URIs or literals) to construct the answers to the ploys a number of structural and semantic optimiza- SPARQL query. In the implementation, Ontop wraps tions. the result set obtained from the database via JDBC and creates corresponding Java objects for OWL API or Structural Optimizations. Ontop applies three main Sesame API. structural optimizations: (a) pushing the joins inside 3.2.4. Performance the unions, (b) pushing the functions as high as pos- The cost of query answering in Ontop can be split sible in the query tree, and eliminating sub-queries. into three parts: (i) the cost of generating the SQL Returning to the running example, the SQL query ob- query, (ii) the cost of execution by the RDBMS, and tained by these optimizations is shown in Fig. 5b: (b) (iii) the cost of fetching and transforming the SQL re- and (c) convert the join over the complex expressions sults into RDF terms. We have studied the performance into a join over the attributes of the relations (effec- of Ontop using several benchmarks (e.g., BSBM, Fish- tively de-IRIing the join) and subsequently remove the Mark, LUBM, NPD) and settings (e.g., different DB subqueries. engines, number of clients, dataset size) [21,39,22,36]. The obtained results suggest that the performance of Semantic Optimization. Ontop adopts techniques from Ontop depends more on the complexity of the combi- the Semantic Query Optimization (SQO) [11,20] field. nation of ontology and mappings than on the size of SQO refers to a set of techniques that perform the se- the dataset. This is in line with the well-known theoret- mantic analysis of SQL queries and transform them ical results that state that the translation from SPARQL into a more efficient form, e.g., by removing redundant to SQL is exponential in the worst case [14]. In bench- self-joins, and detecting unsatisfiable or trivially satis- marks like BSBM, FishMark, and LUBM, where the fiable conditions. SQO techniques often use database number of mappings and the number of ontological dependencies such as primary and foreign keys to re- terms are small and the dataset ranges from 25 to 200 duce the size and complexity of the query. In our run- million triples, Ontop outperforms its competitors by ning example, these optimizations eliminate the self- orders of magnitude [39,21,36]. This performance is join, which is redundant because pid is the primary the result of (i) the fast SPARQL-to-SQL translation key of tbl_patient; the result is shown in Fig. 5c. Ob- (around 4–15ms); (ii) the efficient optimization of the serve that it has a subquery that could not be elimi- SQL; and (iii) the well-known efficiency of relational nated before, but because of SQO we can apply struc- DB engines. For instance, in BSBM with 200 million tural optimization (c) again to obtain a simpler SQL triples, Ontop can run more than 400.000 queries per query, shown in Fig. 5d. hour (44k query mixes per hour). Observe that these optimizations complement and To better understand the performance of OBDA sys- interact with each other. The optimization step is crit- tems, we developed a more challenging benchmark, ical [22] and nontrivial. The translation of more com- the NPD Benchmark [22], which reveals the strengths plex queries is more involved and implies tackling the and pitfalls of OBDA. The benchmark comes with gap between the SQL and SPARQL . This simple example is meant to provide an intuition; the 6https://github.com/ontop/ontop/wiki/ interested reader is referred to [21,38]. Case-sensitivity-for-SQL-identifiers Ontop 9 thousands of axioms in the ontology and rules in map- turbines, compressors, and generators. Each device is pings, and a dataset containing up to 4 billion triples. monitored by different sensors. All dynamic (observa- The results comparing Ontop and Stardog show that tional) data from the sensors is stored in one large rela- indeed the approach is scalable but more work is tional database (PostgreSQL) using more than 150 ta- needed to optimize the generated SQL queries. bles per device. About 30 GB of new sensor and event The query set, which was obtained by interview- data is being generated every day, resulting in a total ing users of the NPD dataset [41], can be divided into of 100 TB of timestamped data. queries that are translated by Ontop into efficient SQL One of the main tasks of service engineers monitor- queries and queries that are translated into exponen- ing these devices is to promptly solve issues detected tially larger SQL queries. For instance, Query 12 from by gathering the relevant sensors data and analyzing the benchmark is translated into a union of more than it. Nowadays, the data gathering phase is the bottle- 10.000 SQL subqueries. Ontop outperforms Stardog neck of the process because it takes about 80% of the whenever the SPARQL queries are not translated into amount of time spent by engineers. The reason why the very large SQL queries. In the remaining cases, Star- gathering phase is critical partially resides in the com- dog outperforms Ontop by orders of magnitude. The plexity and quantity of the data to be analyzed. Ide- results confirm that the exponential blow-up of the ally the engineers should be able to access the data di- translation into SQL is a major source of performance rectly, by creating and combining queries in a way that loss in modern OBDA systems. If this issue is not han- is compatible with their knowledge. However, the data dled properly, it can prevent OBDA systems from be- is often organized to better serve the applications rather ing deployed in production environments. We are cur- than in a most intuitive way for the domain experts. rently working on different techniques for tackling this The OBDA approach to solving the problem con- issue. sists of topping the database with an ontology that uses the terminology of the engineers and mediates between the engineers and the data [18]. Observe that in the 4. Industrial Applications Siemens scenario this requires handling historical and streaming data. It is worth noticing that OBDA with The adoption of Ontop by the community has been temporal and streaming data is still an open branch of growing steadily since 2010: last year (2014) On- research, and Ontop alone is not fully able to cope with top got 500 new registrations, the webpage got 7000 all the requirements of this use case. However, Ontop hits, and the mailing list in excess of 120 threads. can provide the backbone including the temporal and Nowadays, Ontop is the core of the Optique Plat- streaming dimensions in the Optique platform [26]. form (developed in the Optique EU project [13]), and Statoil. Statoil is an international energy company it is actively being used in academia7 (e.g., in Se- with main activities in gas and oil extraction. It is head- mantic Mediator [5], for accessing electronic health quartered in Norway and present in over 30 countries records [31], and for querying temporal and streaming around the world. data in OBDA [26]) and being experimented with in Geologists at Statoil access several databases on a industry. daily basis, of which one of the most relevant is the In this section we describe the use cases of the two major industrial partners in the Optique Project, Exploration and Production Data Store (EPDS). EPDS namely Siemens and Statoil, and what role Ontop is is a large SQL database comprising over 1500 ta- expected to play there. bles with information about historical exploration data (e.g., layers of rocks, porosity), production logs, maps, Siemens. Siemens Energy is one of the four sectors etc. It also contains business information such as li- of Siemens AG corporation. It is in charge of gen- cense areas and companies. The schema is organized erating and delivering power from numerous sources. in such a way that direct data access by engineers Siemens Energy runs several service centers for power (and geologists in particular) often becomes challeng- plants. Each center monitors thousands of devices re- ing or even impossible. Similarly to the Siemens use lated to power generation, including gas and steam case, the main problem lies not only in the size of the schema and the data but also in the obscure structure 7https://github.com/ontop/ontop/wiki/ of this legacy database. The solution currently adopted UseCases by Statoil relies on tools that enable domain experts to 10 Ontop combine different pre-defined queries. The problem of 5.1. Triplestores these pre-defined queries is that they often are too spe- cific, or too general, or cannot be easily combined to Virtuoso Universal Server8 is a hybrid DBMS that obtain the desired results. can be used as a relational database, a , The role of Ontop (and Optique) in such scenario is or an OBDA system. It has two editions: an open- helping the engineers formulate their own queries au- source and a commercial one. From the perspective tonomously using the domain vocabulary [13]. Simi- of answering SPARQL queries, Virtuoso is mostly larly to the Siemens use case, this will be achieved by used as a triplestore. It supports SPARQL 1.1 and, in enriching EPDS with an ontology expressed in terms this mode, it offers some backward- (by default) and familiar to the engineers. forward-chaininqg capabilities for a limited subset of RDFS and OWL. When Virtuoso is used as a regu- lar RDBMS, it can be extended into an OBDA sys- 5. Related SPARQL Query Answering Systems tem by setting up mappings in its own mapping lan- guage. However, this OBDA mode has several limita- In this section, we briefly review the most popu- tions: no reasoning capability is available and only a lar SPARQL query answering systems, which can be small fragment of R2RML is supported. Virtuoso can categorized into two major types: OBDA systems and be accessed through the Sesame and Jena APIs. triplestores. Their main features are summarized in Ta- ble 1. GraphDB,9 previously known as OWLIM [4], is a Triplestores provide a flexible generic logical model commercial triplestore developed by . It fully that allows them to accept and store any set of RDF supports SPARQL 1.1. OWL reasoning support is triples. However, if the triples are generated from ex- based on the forward-chaining rule-based material- ternal sources (e.g., relational databases) then an inter- ization approach. This strategy naturally fits with the mediate ETL (Extract, Transform and Load) process OWL 2 RL profile but is incomplete for OWL 2 QL [3]. must be set up between these external sources and the GraphDB is accessible through the Sesame API. triplestore. The ETL process can be expensive, espe- Stardog10 is a commercial triplestore developed by cially when datasources are updated frequently. Clark&Parsia. It supports SPARQL 1.1 and several OBDA systems, on the other hand, are set up reasoning levels: RDFS, the three profiles (OWL 2 QL, over existing relational datasources and exploit their OWL 2 EL, OWL 2 RL), and OWL 2 DL (however, domain-specific schemas. By using ontologies and completeness in OWL 2 DL is guaranteed only for mappings, they expose the database as a virtual RDF schema reasoning). The reasoning level can be chosen graph that can be queried using SPARQL. No ETL is by the user at query time. Stardog avoids eager mate- thus required. rialization and the reasoning is partly based on query Some triplestores and OBDA systems have reason- rewriting. It can be accessed through the Sesame API. ing capabilities. The most common strategy for triple- stores is forward-chaining, which consists in extend- RDFox11 is an in-memory triplestore developed at the ing the set of RDF triples by means of inferences ac- University of Oxford. It implements a novel shared- cording to a given set of rules. Thus, the OWL 2 RL memory parallel Datalog reasoning algorithm and sup- profile of OWL 2 (and similar rule-based ontology lan- ports OWL 2 RL reasoning by materialization [25]. guages) are most suitable for triplestores. Forward- The system is a cross-platform software written in C++ chaining has some drawbacks: the inferences can be and comes with a Java wrapper supporting OWL API. costly in terms of both time and space; moreover, up- dates and deletions of triples require additional book- 5.2. OBDA systems keeping for incremental reasoning. D2RQ12 is one of the pioneering OBDA systems, de- In contrast to triplestores, the most common strat- veloped at the Free University of Berlin and DERI. egy for OBDA systems is query rewriting, and so the profile of OWL that is most suitable for this setting is OWL 2 QL. To guarantee rewritability, certain fea- 8http://virtuoso.openlinksw.com/ 9 tures, such as recursion and property chains, are not http://www.ontotext.com/products/ ontotext-graphdb/ allowed in OWL 2 QL. 10http://stardog.com/ In the remainder of this section, we review various 11http://www.cs.ox.ac.uk/isg/tools/RDFox/ implementations from these two categories. 12http://d2rq.org/ Ontop 11

System Reasoning Mapping support License Starting year

Triplestore Virtuoso RDFS∗ Native, R2RML∗ GPL 2, Commercial 1999 GraphDB OWL 2 RL - Commercial 2005 Stardog OWL 2 ∗/ SWRL∗ - Commercial 2012 RDFox OWL 2 RL / SWRL / Datalog - Academic 2013 OBDA D2RQ No D2RQ Mapping, R2RML∗ Apache 2 2004 Mastro OWL 2 QL R2RML∗ Academic 2006 Ultrawrap RDFS-Plus Native, R2RML Commercial 2012 Morph-RDB No R2RML Apache 2 2013 Ontop OWL 2 QL / SWRL∗ Ontop Mapping, R2RML Apache 2 2010

Table 1 Feature matrix of SPARQL query answering systems (∗ indicates limited support)

This query rewriting system provides some query op- 6. A Retrospective timizations but these have often been reported as in- sufficient: for instance, the generated SQL queries can Ontop has its roots in our initial work on the QuOnto contain an excessive number of joins [29]. It provides and Mastro [1,8] systems (2006), in the OBDA plugin its own mapping language, D2RQ, and supports only for Protégé [28], and in our DIG extension for Mas- a fragment of R2RML. No inference mechanism is in- tro [33]. QuOnto is a reasoner for the description logic cluded. This software (last release in 2012) is available DL-Lite with plain conjunctive query (CQ) answering. under an open-source license. Mastro extends QuOnto with support for GAV (global as view) mappings for relational databases [27]. Both Mastro13 is an OBDA system that shares common ori- systems are maintained by Sapienza University of gins with Ontop. This query rewriting system sup- Rome. Our work enabled the use of these systems ports reasoning over OWL 2 QL ontologies. Unlike through the ontology editor Protégé 3 and the DIG rea- other OBDA systems mentioned here, it supports only soner API. a restricted fragment of SPARQL that corresponds to Using these tools we interacted with third parties to develop several OBDA applications [10,8,17,37,32] unions of conjunctive queries. Mastro is available only (for a full list, see [32]). Such interactions allowed for demonstration, testing, and evaluation purposes. us to test both the performance of the state-of-the-art Ultrawrap14 is an OBDA system commercialized by query rewriting techniques and the feasibility of apply- Capsenta. It was recently extended to support infer- ing this technology in data integration and data access. ence over an extension of RDFS with inverse and tran- The insights we obtained concern technique/optimiza- sitive properties [40]. Ultrawrap uses an analogue of tion issues and API/features issues. These two paths of development characterized our work from then. We T -mappings of Ontop, which are called saturated map- now briefly elaborate on them. pings and which are used for creating regular and ma- terialized views in the relational database. Reasoning, Optimization and Performance. The main issue initially was the large number of CQs 15 Morph-RDB, formerly called ODEMapster, is an generated by the rewriting technique implemented in open-source OBDA system supporting the R2RML QuOnto (the PerfectRef algorithm [9]), which some- and Direct Mappings standards. This system imple- times, even for simple ontologies and mappings, pro- ments a number of query optimizations techniques duces in the order of hundreds of thousands of CQs. such as the self-join elimination [29]. It, however, has And although DBMSs do perform very well in gen- no OWL inference capability. eral, they have problems with automatically generated queries (and this applies to both commercial and non- commercial database engines). To deal with the issue,

13http://www.dis.uniroma1.it/~mastro/ we extended the PerfectRef algorithm by the Seman- 14http://capsenta.com tic Query Optimization (SQO) component, which re- 15https://github.com/oeg-upm/morph-rdb moves redundant CQs and eliminates redundant self- 12 Ontop joins using database dependencies (foreign and pri- and distribution, Ontop was repackaged as a Maven mary keys) [32]. project and has been available from the official Maven The work in this direction materialized in the first repository since 2013. We gradually introduced version of Ontop (2010), which at the time was called project-wide testing, starting with functional tests Quest (the name now refers only to the query answer- for the reasoning modules, query answering modules ing engine). Quest focused on RDFS and OWL 2 QL (including the DAWG tests for SPARQL 1.0), and (instead of DL-Lite) and SPARQL (instead of CQs). virtual RDF modules (including the DAWG tests for This required a new mapping language suitable for the R2RML). Now most JUnit tests (∼2000) are automat- ontology languages (RDFS and OWL). Quest is ca- ically run with Travis-CI whenever new changes are pable of working in two modes: (i) the virtual mode pushed to GitHub. supports virtual RDF graphs via mappings, and (ii) the RDF triplestore mode stores triples directly in a rela- tional database. Two further ideas were developed for 7. Conclusion improving performance: T -mappings for the virtual In this paper we presented Ontop, a mature open- mode [34,21] (c.f. Section 3.1) and the Semantic Index source OBDA system. It allows users to access rela- for the RDF triplestore mode [35]. The tree-witness tional databases through a conceptual representation of query rewriting algorithm [19] was implemented to re- the domain of interest in terms of an ontology. The sys- duce the size of rewritings and take advantage of the tem is based on solid theoretical foundations and has T -mappings and the Semantic Index. We also stud- been designed and implemented towards compliance On- ied the execution of the SQL queries produced by with relevant W3C standards. It supports all major re- top and observed [32] that the generic database-centric lational databases and implements multiple optimiza- SQO was not sufficient and other techniques were re- tion techniques to offer a good level of performance. quired for the issues specific to the context of map- Ontop has been adopted in several academic and in- pings and ontologies (e.g., join conditions can be sim- dustrial use cases. plified by de-IRIing and the resulting joins eliminated In the future, we plan to extend Ontop in the follow- as explained in Section 3.2.2). ing directions: The more recent lines of research in Ontop in- clude the formalization of SPARQL in the context of (i) In order to further improve performance, we will OBDA [38], under the OWL 2 QL/RDFS entailment investigate data-dependent optimizations. regimes [21], and SWRL with limited forms of re- (ii) We plan to support larger fragments of the cursion supported by SQL Common Table Expres- SPARQL (e.g., aggregation, negation, and path sions [43]. queries) and R2RML (e.g., named graphs) stan- dards. API, Features and Accessibility. With the first ver- (iii) For end-users, we will improve the GUI inter- sion of Ontop, we shifted our focus from the De- faces and add utilities to make Ontop even more scription Logic domain to Semantic Web technolo- user-friendly. gies, gradually increasing our support for RDF, RDFS, (iv) We plan to go beyond relational databases and OWL 2 QL, and SPARQL. To support the OWL com- support other kinds of data sources (e.g., graph munity, we included the OWL API and Protégé 4 in- databases and document-oriented databases). terfaces for Ontop. To support RDF and the Linked Acknowledgements. This paper is supported by the Data community we created the Sesame API interface EU under the large-scale integrating project (IP) Op- for Ontop, as well as an HTTP SPARQL end-point. tique (Scalable End-user Access to Big Data), grant With the publication of R2RML as the official W3C agreement n. FP7-318338. RDB2RDF mapping language, we extended Ontop to support it. Ontop was initially released under a non- References commercial use license before adopting the permissive Apache 2.0 license in 2013. In addition, the project [1] Andrea Acciarri, Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, Mattia Palmieri, and is now hosted in GitHub so that anybody can easily Riccardo Rosati. QUONTO:QUerying ONTOlogies. In Proc. download and contribute to it. On the software engi- of the 20th Nat. Conf. on Artificial Intelligence (AAAI), pages neering side, to facilitate integration, building, testing, 1670–1671, 2005. Ontop 13

[2] Alessandro Artale, Diego Calvanese, Roman Kontchakov, and Chimamiwa. Enhancing web portals with Ontology-Based Michael Zakharyaschev. The DL-Lite family and relations. J. Data Access: the case study of South Africa’s Accessibility of Artificial Intelligence Research, 36:1–69, 2009. Portal for people with disabilities. In Proc. of the 5th Int. Work- [3] Barry Bishop and Spas Bojanov. Implementing OWL 2 RL and shop on OWL: Experiences and Directions (OWLED), vol- OWL 2 QL rule-sets for OWLIM. In Proc. of the 8th Int. Work- ume 432 of CEUR Electronic Workshop Proceedings, http: shop on OWL: Experiences and Directions (OWLED), vol- //ceur-ws.org/, 2008. ume 796 of CEUR Electronic Workshop Proceedings, http: [18] Evgeny Kharlamov, Nina Solomakhina, Özgür Lütfü Özçep, //ceur-ws.org/, 2011. Dmitriy Zheleznyakov, Thomas Hubauer, Steffen Lamparter, [4] Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Mikhail Roshchin, Ahmet Soylu, and Stuart Watson. How se- Peikov, Zdravko Tashev, and Ruslan Velkov. OWLIM: A fam- mantic technologies can enhance data access at Siemens En- ily of scalable semantic repositories. Semantic Web J., 2(1):33– ergy. In Proc. of the 13th Int. Semantic Web Conf. (ISWC), vol- 42, 2011. ume 8796 of Lecture Notes in Computer Science, pages 601– [5] Béatrice Bouchou and Cheikh Niang. Semantic mediator 619. Springer, 2014. querying. In Proc. of the 18th Int. Database Engineering & [19] S. Kikot, R. Kontchakov, and M. Zakharyaschev. Conjunctive Applications Symposium (IDEAS), pages 29–38. ACM Press, query answering with OWL 2 QL. In Proc. of the 13th Int. 2014. Conf. on the Principles of Knowledge Representation and Rea- [6] Dan Brickley and R. V. Guha. RDF vocabulary description soning (KR), pages 275–285, 2012. language 1.0: RDF Schema. W3C Recommendation, World [20] Jonathan J. King. QUIST: A system for semantic query op- Wide Web Consortium, February 2004. Available at http: timization in relational databases. In Proc. of the 7th Int. //www.w3.org/TR/rdf-schema/. Conf. on Very Large Data Bases (VLDB), pages 510–517. IEEE [7] Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. Computer Society, 1981. Sesame: A generic architecture for storing and querying RDF [21] Roman Kontchakov, Martin Rezk, Mariano Rodriguez-Muro, and RDF schema. In Proc. of the 1st Int. Semantic Web Conf. Guohui Xiao, and Michael Zakharyaschev. Answering (ISWC), volume 2342 of Lecture Notes in Computer Science, SPARQL queries over databases under OWL 2 QL entailment pages 54–68. Springer, 2002. regime. In Proc. of the 13th Int. Semantic Web Conf. (ISWC), [8] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, volume 8796 of Lecture Notes in Computer Science, pages Maurizio Lenzerini, Antonella Poggi, Mariano Rodriguez- 552–567. Springer, 2014. Muro, Riccardo Rosati, Marco Ruzzi, and Domenico Fabio [22] Davide Lanti, Martin Rezk, Guohui Xiao, and Diego Cal- Savo. The Mastro system for ontology-based data access. Se- vanese. The NPD benchmark: Reality check for OBDA sys- mantic Web J., 2(1):43–53, 2011. tems. In Proc. of the 18th Int. Conf. on Extending Database [9] Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Technology (EDBT), 2015. Maurizio Lenzerini, and Riccardo Rosati. Tractable reasoning [23] Domenico Lembo, Valerio Santarelli, and Domenico Fabio and efficient query answering in description logics: The DL- Savo. Graph-based ontology classification in OWL 2 QL. In Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. Proc. of the 10th Extended Semantic Web Conf. (ESWC), vol- [10] Diego Calvanese, C. Maria Keet, Werner Nutt, Mariano ume 7882 of Lecture Notes in Computer Science, pages 320– Rodriguez-Muro, and Giorgio Stefanoni. Web-based graphical 334. Springer, 2013. querying of databases through an ontology: the WONDER sys- [24] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, tem. In Proc. of the 25th ACM Symposium on Applied Com- Achille Fokoue, and Carsten Lutz. OWL 2 Web Ontology puting (SAC), pages 1388–1395, 2010. Language profiles (second edition). W3C Recommendation, [11] U. S. Chakravarthy, J. Grant, and J. Minker. Logic-based World Wide Web Consortium, December 2012. Available at approach to semantic query optimization. ACM Trans. on http://www.w3.org/TR/owl2-profiles/. Database Systems, 15(2):162–207, 1990. [25] Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, and Dan [12] Souripriya Das, Seema Sundara, and Richard Cyganiak. Olteanu. Parallel materialisation of Datalog programs in cen- R2RML: RDB to RDF mapping language. W3C Recommen- tralised, main-memory RDF systems. In Proc. of the 28th AAAI dation, World Wide Web Consortium, September 2012. Avail- Conf. on Artificial Intelligence (AAAI), pages 129–137. AAAI able at http://www.w3.org/TR/r2rml/. Press, 2014. [13] Martin Giese, Peter Haase, Ernesto Jiménez-Ruiz, Davide [26] Özgür Lütfü Özçep and Ralf Möller. Ontology based data ac- Lanti, Özgür Özçep, Martin Rezk, Riccardo Rosati, Ahmet cess on temporal and streaming data. In Reasoning Web. Rea- Soylu, Guillermo Vega-Gorgojo, Arild Waaler, and Guohui soning on the Web in the Big Data Era – 10th Int. Summer Xiao. Optique – zooming in on big data access. IEEE Com- School Tutorial Lectures (RW), volume 8714 of Lecture Notes puter, Accepted for publication, 2015. in Computer Science, pages 279–312. Springer, 2014. [14] Georg Gottlob, Stanislav Kikot, Roman Kontchakov, [27] Antonella Poggi, Domenico Lembo, Diego Calvanese, Vladimir V. Podolskii, Thomas Schwentick, and Michael Za- Giuseppe De Giacomo, Maurizio Lenzerini, and Riccardo kharyaschev. The price of query rewriting in ontology-based Rosati. Linking data to ontologies. J. on Data Semantics, data access. Artificial Intelligence, 213:42–59, 2014. X:133–173, 2008. [15] Matthew Horridge and Sean Bechhofer. The OWL API: A Java [28] Antonella Poggi, Mariano Rodríguez-Muro, and Marco Ruzzi. API for OWL ontologies. Semantic Web J., 2(1):11–21, 2011. Ontology-based database access with DIG-Mastro and the [16] Joseph S. Nye Jr. The benefits of soft power. Technical re- OBDA Plugin for Protégé. In Kendall Clark and Peter F. Patel- port, Harvard University - Business School, 2004. Available at Schneider, editors, Proc. of the 4th Int. Workshop on OWL: Ex- http://hbswk.hbs.edu/archive/4290.html. periences and Directions (OWLED DC), 2008. [17] C. Maria Keet, Ronell Alberts, Aurona Gerber, and Gibson [29] Freddy Priyatna, Oscar Corcho, and Juan Sequeda. Formalisa- 14 Ontop

tion and experiences of R2RML-based SPARQL to SQL query Realizing ontology based data access: A plug-in for Protégé. In translation using Morph. In Proc. of the 23rd Int. World Wide Proc. of the ICDE Workshop on Information Integration Meth- Web Conf. (WWW), pages 479–490, 2014. ods, Architectures, and Systems (IIMAS 2008), pages 286–289. [30] S. Pugacs. Efficient query answering with semantic indexes. IEEE Computer Society Press, 2008. BSc thesis, KRDB Research Centre for Knowledge and Data, [38] Mariano Rodriguez-Muro and Martin Rezk. Efficient Free University of Bozen-Bolzano, 2011. SPARQL-to-SQL with R2RML mappings. Technical report, [31] Alireza Rahimi, Siaw-Teng Liaw, Jane Taggart, Pradeep Ray, Free University of Bozen-Bolzano, January 2014. Avail- and Hairong Yu. Validating an ontology-based algorithm to able at http://www.inf.unibz.it/~mrezk/pdf/ identify patients with type 2 diabetes mellitus in electronic -.pdf. health records. Int. J. Medical Informatics, 83(10):768–778, [39] Mariano Rodriguez-Muro, Martin Rezk, Josef Hardi, Mindau- 2014. gas Slusnys, Timea Bagosi, and Diego Calvanese. Evaluat- [32] Mariano Rodríguez-Muro. Tools and Techniques for Ontology ing SPARQL-to-SQL translation in Ontop. In Proc. of the Based Data Access in Lightweight Description Logics. PhD 2nd Int. Workshop on OWL Reasoner Evaluation (ORE), vol- thesis, KRDB Research Centre for Knowledge and Data, Free ume 1015 of CEUR Electronic Workshop Proceedings, http: University of Bozen-Bolzano, 2010. //ceur-ws.org/, pages 94–100, 2013. [33] Mariano Rodriguez-Muro and Diego Calvanese. Towards an [40] Juan F. Sequeda, Marcelo Arenas, and Daniel P. Miranker. open framework for ontology based data access with Protégé OBDA: Query rewriting or materialization? In practice, both! and DIG 1.1. In Proc. of the 5th Int. Workshop on OWL: Expe- In Proc. of the 13th Int. Semantic Web Conf. (ISWC), volume riences and Directions (OWLED), volume 432 of CEUR Elec- 8796 of Lecture Notes in Computer Science, pages 535–551. tronic Workshop Proceedings, http://ceur-ws.org/, Springer, 2014. 2008. [41] Martin G. Skjæveland and Espen H. Lian. Benefits of pub- [34] Mariano Rodríguez-Muro and Diego Calvanese. Dependen- lishing the Norwegian Petroleum Directorate’s FactPages as cies: Making ontology based data access work in practice. In Linked Open Data. In Proc. of Norsk informatikkonferanse Proc. of the 5th Alberto Mendelzon Int. Workshop on Founda- (NIK 2013). Tapir, 2013. tions of Data Management (AMW), volume 749 of CEUR Elec- [42] Manolis M. Tsangaris, George Kakaletris, Herald Kllapi, tronic Workshop Proceedings, http://ceur-ws.org/, Giorgos Papanikos, Fragkiskos Pentaris, Paul Polydoras, Eva 2011. Sitaridi, Vassilis Stoumpos, and Yannis E. Ioannidis. Dataflow [35] Mariano Rodriguez-Muro and Diego Calvanese. High perfor- processing and optimization on grid and cloud infrastructures. mance query answering over DL-Lite ontologies. In Proc. of Bull. of the IEEE Computer Society Technical Committee on the 13th Int. Conf. on the Principles of Knowledge Representa- Data Engineering, 32(1):67–74, 2009. tion and Reasoning (KR), pages 308–318, 2012. [43] Guohui Xiao, Martin Rezk, Mariano Rodriguez-Muro, and [36] Mariano Rodriguez-Muro, Roman Kontchakov, and Michael Diego Calvanese. Rules and ontology based data access. In Zakharyaschev. Ontology-based data access: Ontop of Proc. of the 8th Int. Conf. on Web Reasoning and Rule Sys- databases. In Proc. of the 12th Int. Semantic Web Conf. (ISWC), tems (RR), volume 8741 of Lecture Notes in Computer Science, volume 8218 of Lecture Notes in Computer Science, pages pages 157–172. Springer, 2014. 558–573. Springer, 2013. [37] Mariano Rodríguez-Muro, Lina Lubyte, and Diego Calvanese.