High Quality Linked Data Generation from Heterogeneous Data
Total Page:16
File Type:pdf, Size:1020Kb
High Quality Linked Data Generation from Heterogeneous Data De generatie van kwaliteitsvolle gelinkte data uit heterogene gegevens Anastasia Dimou 31 oktober 2017 Table of Contents List of Figures . 1 List of Tables . 1 1 Introduction 1-1 1.1 Semantic Web . 1-3 1.2 Linked Data . 1-4 1.3 Resource Description Framework . 1-7 1.4 Knowledge Representation . .1-10 1.5 Linked Data lifecycle . .1-12 1.5.1 Modeling . .1-12 1.5.2 Generation . .1-13 1.5.3 Validation . .1-13 1.5.4 Provenance . .1-13 1.5.5 Publication . .1-14 1.6 Problem Statement . .1-15 1.6.1 Hypotheses . .1-17 1.6.2 Research Questions . .1-17 1.6.3 Outline . .1-18 1.6.4 Outcomes . .1-19 References . .1-19 2 Declaration 2-1 2.1 Introduction . 2-3 2.2 State of the Art . 2-4 2.2.1 Mapping Languages . 2-4 2.2.2 R2RML . 2-6 2.2.3 Editors . 2-8 2.3 Limitations and requirements . 2-8 2.3.1 Limitations . 2-8 2.3.2 Requirements . 2-9 2.3.3 R2RML generalization . .2-10 2.4 RML Language . .2-11 2.5 RML Extensions . .2-15 2.5.1 Graceful Degradation . .2-15 2.5.2 Data Transformations . .2-15 2.6 RML Editor . .2-18 References . .2-21 c Table of Contents 3 Execution 3-1 3.1 Introduction . 3-3 3.2 State of the Art . 3-3 3.2.1 Structure- and format-specific tools . 3-4 3.2.2 R2RML processors . 3-8 3.3 Execution Factors . 3-9 3.3.1 Purpose . .3-10 3.3.2 Direction . .3-10 3.3.3 Materialization . .3-11 3.3.4 Location . .3-12 3.3.5 Driving force . .3-12 3.3.6 Trigger . .3-13 3.3.7 Synchronization . .3-13 3.3.8 Dynamicity . .3-14 3.4 RML Mapper . .3-14 3.4.1 Phases . .3-15 3.4.2 Architecture . .3-16 3.4.3 Modules . .3-16 3.4.4 Workflow . .3-19 3.4.5 Graphical Interfaces . .3-21 3.5 Evaluation . .3-21 3.5.1 Formal . .3-21 3.5.2 Experimental . .3-22 References . .3-26 4 Quality 4-1 4.1 Introduction . 4-3 4.2 State of the Art . 4-4 4.3 Quality Assessment . 4-5 4.3.1 Linked Data Quality Assessment . 4-6 4.3.2 Mapping Rules Quality Assessment . 4-7 4.4 Mapping Rules Refinements . .4-10 4.5 RML Validator . .4-12 4.6 Evaluation . .4-13 4.6.1 Use cases . .4-13 4.6.2 Results . .4-15 References . .4-17 5 Workflow 5-1 5.1 Introduction . 5-3 5.2 State of the Art . 5-5 5.2.1 Dataset and Service Descriptions . 5-5 5.2.2 Linked Data publishing cycle . 5-6 5.2.3 Provenance and Metadata Vocabularies . 5-7 5.2.4 Approaches for tracing PROV & metadata . 5-8 5.2.5 Discussion . .5-10 5.3 Generation Workflow . .5-10 5.3.1 Access Interfaces . .5-12 Table of Contents d 5.3.2 Data Source . .5-14 5.3.3 Binding Condition . .5-16 5.4 Refinement Workflow . .5-17 5.5 Metadata and Provenance . .5-18 5.5.1 High Level Workflow . .5-21 5.5.2 Metadata Details Levels . .5-22 5.6 RML Workbench . .5-24 References . .5-27 6 Use Cases 6-1 6.1 Open Data . 6-4 6.1.1 EWI Open Data Data Model . 6-4 6.1.2 EWI Open Data Vocabularies . 6-5 6.1.3 EWI Open Data Workflow . 6-5 6.2 iLastic . 6-6 6.2.1 iLastic Model . 6-7 6.2.2 iLastic Vocabulary . 6-8 6.2.3 iLastic Linked Data set . 6-9 6.2.4 iLastic Workflow . 6-9 6.3 COMBUST . .6-12 6.3.1 COMBUST Model . .6-14 6.3.2 COMBUST Vocabulary . .6-15 6.3.3 COMBUST Linked Data set . .6-15 6.3.4 COMBUST Workflow . .6-17 6.4 DBpedia . .6-17 6.4.1 DBpedia Generation Limitations . .6-18 6.4.2 DBpedia Workflow . .6-19 6.4.3 DBpedia Linked Data subsets . .6-21 6.5 CEUR-WS . .6-22 6.5.1 Semantic Publishing Challenge . .6-23 6.5.2 RML-based Workflow . .6-24 6.5.3 Solutions Description . .6-26 6.5.4 Solutions Comparison . .6-27 6.5.5 Discussion . .6-30 References . .6-31 7 Conclusions 7-1 List of Figures 1.1 Linked Open Data cloud in 2007 . 1-5 1.2 Linked Open Data cloud in 2017 . 1-6 1.3 An RDF triple consists of a subject,.