Comparison of Geospatial Support in Rdf Stores: Evaluation for Icos Carbon
Total Page:16
File Type:pdf, Size:1020Kb
Master Thesis in Geographical Information Science nr 99 COMPARISON OF GEOSPATIAL SUPPORT IN RDF STORES: EVALUATION FOR ICOS CARBON PORTAL METADATA Syed Muhammad Amir Raza 2019 Department of Physical Geography and Ecosystem Science Centre for Geographical Information Systems Lund University Sölvegatan 12 S-223 62 Lund Sweden Syed Muhammad Amir Raza (2019). Comparison of geospatial support in RDF stores: Evaluation for ICOS Carbon Portal metadata. Master degree thesis, 30/ credits in Master in Geographical Information Science Department of Physical Geography and Ecosystem Science, Lund University ii COMPARISON OF GEOSPATIAL SUPPORT IN RDF STORES: EVALUATION FOR ICOS CARBON PORTAL METADATA Syed Muhammad Amir Raza Masters thesis, 30 credits in Geographical Information Sciences Supervisors Lars Harrie Dept of Physical Geography and Ecosystem Science Faculty of Science, Lund University Weiming Huang Dept of Physical Geography and Ecosystem Science Faculty of Science, Lund University iii Abstract The evolution of World Wide Web (WWW) into semantic web is happening with the aid of standards like Resource Description Framework (RDF), SPARQL and a few others from World Wide Web Consortium (W3C). Over the years, semantic data management technologies have been introduced as software platforms commonly known as RDF stores. Lately these RDF stores have been tested for processing and maintenance of large data sets complying with Linked Data principles. In order to standardize geographic capabilities in these RDF stores, Open Geospatial Consortium (OGC) adopted GeoSPARQL as an extension to SPARQL query. Our study aims to discuss the geospatial capabilities, and the conformance to GeoSPARQL standard, of the five RDF stores: Eclipse RDF4J 2.4.0, Apache Jena 3.9.0, Openlink Virtuoso 7.2.4, Stardog 6.0.1 and GraphDB 8.8.0. Along with the investigation of features, the performance evaluation of these RDF stores has also been conducted by measuring the execution times of a set of GeoSPARQL queries. The evaluation query set consists of non- topological, spatial selection as well as spatial join queries adopted from a spatial benchmark, Geographica. The geospatial component of Integrated Carbon Observation System (ICOS) Carbon Portal (CP) metadata has been used for performance evaluation in order to establish the suitability of the RDF stores for ICOS-CP requirements. Java Programs have been developed in order to interact with all the RDF stores for upload of data and execution of benchmark queries. Some result set disparities amongst the RDF stores as well as variation in performance metrics on different hardware platforms have also been highlighted in our research. Keywords: Geography, Geographical Information Systems (GIS), GeoSPARQL, Geospatial query language, RDF stores, Java Programming, RDF4J, Jena, Virtuoso, Stardog, GraphDB iv Table of Contents Abstract .................................................................................................. iv Table of Contents ................................................................................... v List of Figures ...................................................................................... viii List of Tables ....................................................................................... viii List of Abbreviations .......................................................................... viii List of Products ...................................................................................... x 1. INTRODUCTION............................................................................. 1 1.1. Background ............................................................................................... 1 1.2. Problem Statement ..................................................................................... 2 1.3. Aim ........................................................................................................... 2 1.4. Study Design ............................................................................................. 2 1.5. Disposition ................................................................................................ 4 1.6. Limitations ................................................................................................ 4 2. TECHNICAL BACKGROUND ...................................................... 5 2.1. Background of the Semantic Web .............................................................. 5 2.2. Semantic Web Technology ........................................................................ 7 2.2.1. RDF Model ............................................................................................................... 7 2.2.2. RDF Vocabulary ....................................................................................................... 8 2.2.3. RDF Schema (RDFS) Vocabulary ............................................................................ 8 2.2.4. Web Ontology Language (OWL) ............................................................................. 9 2.2.5. SPARQL ................................................................................................................... 9 2.3. Linked Data ..............................................................................................10 2.3.1. Linked Data Sets and Repositories ......................................................................... 11 2.4. Basic Geospatial Concepts ........................................................................13 2.4.1. Spatial Representation Standards ............................................................................ 13 v 2.4.2. Spatial and Topological Relationships.................................................................... 14 2.4.3. Geographic Web Services in Connection to Linked Data ...................................... 15 2.4.4. Spatial Indexing ...................................................................................................... 16 2.5. Geospatial Semantic Web and Metadata ...................................................16 2.6. The GeoSPARQL standard .......................................................................18 2.6.1. Core Component ..................................................................................................... 18 2.6.2. SPARQL Extension Functions ............................................................................... 19 2.6.3. Query Re-write Rules ............................................................................................. 19 3. PREVIOUS WORK IN EVALUATION OF RDF STORES ...... 21 4. MATERIALS AND METHODS ................................................... 23 4.1. Integrated Carbon Observation System and the Carbon Portal ....................23 4.1.1. Integrated Carbon Observation System .................................................................. 23 4.1.2. The ICOS Carbon Portal ......................................................................................... 23 4.1.3 ICOS Data ............................................................................................................... 24 4.1.4 Other Data at ICOS CP ........................................................................................... 24 4.1.5 ICOS Metadata........................................................................................................ 25 4.2. Research Data ...........................................................................................25 4.2.1. Preparation of Research Data.................................................................................. 26 4.3. Evaluation Technique ...............................................................................29 4.3.1. Selection of RDF Stores.......................................................................................... 29 4.3.2. Qualitative Study .................................................................................................... 30 4.3.3. Preparation for Quantitative Study ......................................................................... 30 4.3.4. Quantitative Study .................................................................................................. 31 4.3.5. ICOS-CP Considerations ........................................................................................ 33 4.4. Introduction to Programming with the RDF Stores ...................................33 4.4.1. Eclipse RDF4J ........................................................................................................ 34 4.4.2. Apache Jena ............................................................................................................ 35 4.4.3. Openlink Virtuoso ................................................................................................... 36 4.4.4. Stardog .................................................................................................................... 38 4.4.5. Ontotext GraphDB .................................................................................................. 39 vi 5. RESULT........................................................................................... 41 5.1. Eclipse RDF4J ..........................................................................................41 5.1.1. Geospatial Support .................................................................................................. 41 5.1.2. Benchmark Query Performance .............................................................................. 41 5.2.