An R2RML Mapping Management API in Java

An R2RML Mapping Management API in Java Making an API Independent of its Dependencies Marius Strandhaug Master’s Thesis Spring 2014 Abstract When developing an Application Programming Interface (API), there is often a need to use other libraries to implement some of the functionality. However, using an external library will make the API dependent on it. There may be several libraries that implement the needed functionality. A problem arises if the user needs to use a different library than what the API supports. The user should be able to choose which underlying library the API will use. To achieve this, the API will have to be made in a generic way, so that it can work with any library that implements the needed functionality. This thesis discusses approaches that can make an API independent of its dependencies. We will see that a combination of several approaches are needed in order to achieve the best result. The approaches are tested on an R2RML mapping management API, written in Java. The API was made for the Optique project, led by the LogID group at the University of Oslo. The thesis will first give an introduction to Ontology Based Data Access (OBDA), with a focus on mappings. It will then discuss how to design an independent API. Towards the end, it will discuss the design and implementation of the R2RML API in detail. Acknowledgements First of all, I would like to thank my supervisor Martin Giese for all the help with this thesis. I also want to thank Arild Waaler and the members of Optique for letting me be a part of the project. Thanks to the LogID research group for many interesting seminars. Thanks to Timea Bagosi, who wrote the code for parsing R2RML mappings, as well as being of great help during the development of the API. Also, many thanks to Marco Ruzzi and Riccardo Mancini for writing unit tests for the API. Finally, I want to thank my family and friends for all the motivation and support. ii Contents 1 Introduction 1 1.1 Motivation and Approach . .1 1.2 Contributions . .2 2 Ontology Based Data Access 3 2.1 What is an Ontology? . .3 2.2 Description Logics and OWL . .4 3 Optique 7 4 Mapping Relational Databases to RDF 11 4.1 D2RQ . 14 5 R2RML 17 5.1 Background . 18 5.2 Mapping a Database to RDF with R2RML . 20 5.3 The R2RML Data Model . 21 5.3.1 Triples Map . 21 5.3.2 Logical Table . 21 5.3.3 Predicate Object Map . 22 5.3.4 Term Maps . 22 5.3.5 Referencing Object Map . 24 5.4 Advantages and Disadvantages of R2RML . 26 5.5 An example mapping . 27 6 API Design 29 6.1 API Requirements . 30 7 Design Patterns 33 7.1 Singleton . 34 7.2 Abstract Factory Pattern . 36 8 Library Independency 39 8.1 Dependency Injection . 41 8.1.1 Library Configuration . 43 8.2 Generics . 46 8.3 Class Objects . 48 8.4 C++ Templates . 50 iii 9 R2RML API Design 53 9.1 Requirements . 53 9.2 API Architecture . 56 10 Supported Libraries 63 10.1 Jena . 63 10.2 Sesame . 64 10.3 OWL API . 65 11 R2RML API Implementation 69 11.1 Tools . 69 11.1.1 Eclipse . 69 11.1.2 Subversion . 69 11.2 Implementation of the R2RML API . 70 11.3 Code examples . 71 12 Evaluation and Conclusion 75 12.1 Conclusion . 76 12.2 Future Work . 77 A R2RML API Code and Documentation 79 iv List of Figures 3.1 An overview of the Optique architecture[30] . .9 4.1 An example relational database. 13 5.1 The process of mapping an RDB to RDF. 20 5.2 An overview of the R2RML data model. 21 5.3 An example relational database. 27 7.1 An example using the Abstract Factory Pattern. 37 8.1 GUIHandler is dependent on WindowsFrame and WindowsButton. ................................... 41 8.2 GUIHandler is dependent on the abstractions Frame and Button............................... 42 8.3 Overview of dependencies using a library configuration. 44 9.1 Class diagram of the LogicalTable. 57 9.2 Class diagram of the central access point to the API. 59 9.3 Illustration of how the bridge module makes the R2RML API work with an external library. 61 9.4 The bridge module is dependent on both the core module and the library. 61 10.1 The structure of RDF statements in Jena. 64 10.2 The structure of RDF statements in OpenRDF Sesame. 65 10.3 The structure of RDF statements in OWL API. 66 v vi Listings 4.1 The direct mapping of the relational database in Figure 4.1. 13 4.2 An example D2RQ mapping that can be used on the database in Figure 4.1 on page 13. 15 5.1 A R2RML mapping that generates the graph in Listing 5.2 on page 28. 27 5.2 The resulting RDF graph given by applying the mapping in Listing 5.1 on page 27 to the database in Figure 5.3. 28 7.1 A simple (but unsafe) singleton implementation. 36 7.2 A modern singleton implementation. 36 7.3 Usage of the Abstract Factory Pattern. 38 8.1 A simple implementation of constructor injection. 42 8.2 An implementation of interface injection. 43 8.3 A simplified example of how the library configuration (LibConfig) can be used. 45 8.4 The Java List interface. 46 8.5 An example showing how the type parameters propagate to other classes. 47 8.6 Usage of an API with Java Generics. 48 8.7 Example of a set-method using class objects. 49 8.8 Usage of an API with class objects. 50 8.9 An example of a class template. 51 8.10 Extended class template. 52 11.1 The mapping used in the code examples. 71 11.2 How to create a Jena mapping with the R2RML API. 72 11.3 How to import a Jena mapping with the R2RML API. 72 11.4 How to serialize a Jena mapping with the R2RML API. 73 11.5 An example implementation of the LibConfiguration interface. 74 11.6 Using a custom RDF library with the R2RML API. 74 vii viii Chapter 1 Introduction 1.1 Motivation and Approach Being able to access Big Data in an effective manner, is a great challenge for many companies. The size and complexity of such data makes it very hard to access the relevant data. E.g. an oil company like Statoil generates several terabytes of data every day. Engineers need to use this data, but quering it requires detailed knowledge about the data sources. The engineers often have to ask an IT expert to construct queries for them. Ideally, the engineers should be able to query the data themselves. Ontology Based Data Access (OBDA) is a data access paradigm that has the potential to solve this problem. It uses an ontology as a semantic layer on top of the underlying data sources. The Optique project aims to bring a paradigm shift to data accessing using an OBDA approach. An essential part of OBDA is the mappings. Relational databases are widely used to store data, but in order to access these through an ontology, it has to be mapped to another data model called RDF. This can be done with a mapping language such as R2RML. An R2RML mapping specifies the relation between tables in the databases, and the RDF. In order to map a database to RDF, several of these mappings are needed. The mappings have to managed by an IT expert in order to keep them up to date with the needs of the end-users. However, the mappings are serialized in RDF, so manipulating them manually is not very convenient. Optique therefore needed an R2RML mapping management API in Java. This API should provide an abstraction of the R2RML data model, above its RDF serialization. During the development, we decided to make the API available to the public, in the hope of establishing a de-facto standard Java implementation of R2RML. To do this, the API has to be made in a way that makes it appealing for as many developers as possible. We therefore wanted to make the API independent of any specific external library. The first Optique version was 1 dependent of the OpenRDF Sesame API to handle RDF functionality. We wanted the API to work with any RDF library that a user would want to use. There are several potential approaches for making an API independent of an underlying library (see Chapter 8 on page 39). There may be a combination of several approaches that is needed to get the best result. The approach should not increase the complexity of the API too much. Ideally, using the independent API should not be very different from using the dependent version of the API. 1.2 Contributions This thesis will discuss how to make an API that is independent of its dependencies. The goal is to establish an approach that can be used in a variety of situations. The approach will be used on the R2RML API. These are the contributions that this thesis will make: • Design an R2RML mapping management API in Java. • Discuss possibilities for making an API independent of its dependencies. • Apply the approach to the R2RML API. The R2RML API will provide programatical access to R2RML mappings with Java. It will give an abstraction of the R2RML mappings language above the level of its RDF serialization.

Load more