Embedding a Multimedia Metadata Model Into a Workflow-Driven Environment Using Idiomatic Semantic Web Technologies

emanuel berndl EMBEDDINGAMULTIMEDIAMETADATAMODEL INTOAWORKFLOW-DRIVENENVIRONMENT USINGIDIOMATICSEMANTICWEB TECHNOLOGIES EMBEDDINGAMULTIMEDIAMETADATAMODELINTOA WORKFLOW-DRIVENENVIRONMENTUSINGIDIOMATIC SEMANTICWEBTECHNOLOGIES emanuel berndl master of science (m.sc.) Doctoral Thesis December 2018 University of Passau Faculty of Computer Science and Mathematics Chair of Distributed Informations Systems Emanuel Berndl: Embedding a Multimedia Metadata Model into a Workflow- Driven Environment Using Idiomatic Semantic Web Technologies, © De- cember 2018 In many ways, people growing up with the Web and now the Semantic Web take the power at their fingertips for granted. — Tim Berners-Lee, 2005 ABSTRACT The Semantic Web exists for about 20 years by now, but its applicability as well as its presence does not live up to the standards of its original idea. Incorporated Semantic Web Technologies do have an initial barrier to learn and apply, which can discourage many po- tential users. This leads to less available data overall in addition to decreased data quality. This work solves parts of the aforementioned problem by supporting idiomatic entry to those Semantic Web Technologies, allowing for "easier" accessibility and usability. Anno4j is a Java library that im- plements a form of Object-Relational Mapping for RDF data. With its application, RDF data can be created via a mapping by simply instan- tiating Java objects - an object-oriented programming concept the user is familiar with. On the other side, requesting persisted data is sup- ported by a path-based querying possibility, while other features like transactional behaviour, code generation, and automated validation of input contribute to a more effective, comprehensive, and straight- forward usage. A use-case is provided by the MICO Platform, a centralised soft- ware instance that connects autonomous multimedia extractors in a workflow-driven fashion. This leads to a rich metadata background for the inserted multimedia files, enabling them to be used in diverse scenarios as well as unlocking yet hidden semantics. For this task it was necessary to design and implement a metadata model that is able to aggregate and merge the varying extractor results under a common “headline”: the MICO Metadata Model. The results of this work allow the use case to incorporate idiomatic Semantic Web Technologies which are then usable natively by non- Semantic Web experts. Additionally, an increase has been achieved in forms of data integration, synchronisation, integrity and validity, as well as an overall more comprehensive and rich implementation of the multimedia extractors. vii ZUSAMMENFASSUNG Das Semantic Web existiert nun seit rund 20 Jahren, jedoch ist des- sen Anwendbarkeit und Präsenz nicht auf dem Stand, der in der ur- sprünglichen Idee angestrebt wurde. Die verwendeten semantischen Technologien besitzen eine Einstiegshürde die potenzielle Nutzer ab- schrecken kann. Dies führt sowohl zu weniger verfügbaren Daten sowie einer schlechteren Datenqualität. Diese Arbeit löst einen Teil dieses Problems, indem diese Technolo- gien ëinfacheränwendbar für den Benutzer gemacht werden. Anno4j ist eine Java Bibliothek die eine Form einer objektrelationalen Ab- bildung für RDF Daten umsetzt. Dadurch können RDF Daten über ein Mapping erstellt werden, indem lediglich einfache Java Objekte erstellt werden müssen - ein objektorientiertes Konzept das dem Be- nutzer bekannt ist. Eine Abfrage der Daten wird durch Pfad-basiertes Querying vereinfacht, während andere Eigenschaften wie transaktio- nales Verhalten, Code-Generierung und automatisierte Validierung weitere Beiträge zur effektiveren, umfassenderen und unkomplizier- teren Benutzung liefern. Den Anwendungsfall stellt die MICO Platform dar, eine zentrale Softwareinstanz die über Workflow-orientierte Arbeitsweise autono- me Multimedia Extraktoren miteinander verbindet, um eingegebene Multimediainhalte auf ihren Metadatenhintergrund zu untersuchen. Dies erlaubt die Inhalte in einem breiterem Spektrum sowie eventuell unvorhergesehenen Szenarien zu erschließen. Ein eigens entwickeltes RDF Metadatenmodell, das MICO Metadata Model, bildet dafür ein uniformes Datenformat, um alle erstellten Ergebnisse unter einen ge- meinsamen Hut zu bringen. Die Ergebnisse dieser Arbeit ermöglichen in diesem Anwendungs- fall eine Vereinfachung der semantischen Technologien für Semantic Web-fremde Benutzer, eine Verbesserung in Form von Datenintegra- tion, Synchronisation und Integrität, sowie eine funktional reichere Implementierung der Multimediaextraktoren. viii CONTENTS i preface1 1 the semantic web3 1.1 The Vision and Idea of the Semantic Web 4 1.1.1 A Semantic Web Use Case 4 1.1.2 A Look at the Current State of the Semantic Web 7 1.1.3 Open Issues of the Semantic Web 9 1.1.4 The Semantic Gap - A Metadata Problem in the Domain of Multimedia 10 1.2 A Technical Setting for this Thesis - The MICO Project 12 1.3 Research Questions and Contributions 13 1.4 Structure of this Thesis 16 ii modelling metadata - classic approaches and a multimedia context 19 2 principles of metadata modeling and querying 21 2.1 The Semantic Web and Other Familiar Concepts 22 2.2 The “Resource Description Framework” - The Back- bone of Metadata Modeling 29 2.2.1 Information Units in RDF and Their Structure 30 2.2.2 RDF Basics and Concepts for Metadata Mod- elling 32 2.2.3 RDF Vocabularies and Ontologies 37 2.2.4 RDF Datasets and Named Graphs 41 2.2.5 Expressing and Transporting RDF Data - RDF Documents and Their Serialisations 42 2.2.6 Advanced RDF Features - Inferencing, Reason- ing, and Reification 48 2.2.7 Querying and Manipulating RDF Data - SPARQL, the SPARQL Protocol and RDF Query Language 50 3 combining metadata modeling with multimedia 55 3.1 Related Work - Multimedia Metadata Modeling 57 3.2 The Web Annotation Data Model 64 3.2.1 Web Annotation Structure 65 3.2.2 RDF Classes for Body and Target Components 67 3.2.3 Fragmentation of Resource IRIs and Selectors 69 3.2.4 Additional Information for Web Annotation Nodes 73 3.2.5 Complete Exemplary Web Annotation 75 3.3 The MICO Metadata Model - Connecting Annotations 76 3.3.1 Composition Module 78 3.3.2 Context Module 79 ix x contents 3.3.3 The Content Module / The Body of the Part An- notation 80 3.3.4 The Selection Module / The Target of the Part Annotations 81 3.3.5 Provenance Module 82 3.3.6 Multimedia Ontology Requirements Check 87 iii the application of multimedia metadata in a workflow-driven approach and idiomatic semantic web technologies 91 4 increasing the usability and applicability of semantic web technologies for multimedia metadata modeling and workflows - anno4j 93 4.1 Related Work - Object/Relational and Object/RDF Map- pings 95 4.2 Anno4j - An Object-RDF-Mapping Library 102 4.3 Creation of Metadata with Anno4j 110 4.4 Querying of Metadata with Anno4j 117 4.5 Established Database Concepts of the Anno4j Library 123 4.5.1 Supporting Transactional Behaviour in Anno4j 125 4.5.2 Validate Database Input with Validated Trans- actions 126 4.5.3 Schema Annotations for Data Validity 127 4.6 Automated Domain Model Generation through Anno4j RDF Schema Parsing 130 4.6.1 Domain Model Generation Functionality 131 4.6.2 Generation Process Internals and Algorithms 133 4.6.3 Generation of a Web Component for a Metadata Model 141 4.7 Additional Anno4j Database Features 143 4.8 Anno4j Conclusion, Outlook, and Envisioned Addi- tions 146 5 a workflow-driven approach for multimedia metadata application 153 5.1 RW - Multimedia Metadata Platforms, Metadata Life- cycles 155 5.2 Embedding the Multimedia Metadata Workflow - MICO 165 5.2.1 Accessing the Produced Results of the MICO Platform 168 5.2.2 MICO Extractor Description 170 5.2.3 Implementing Own MICO Extractors 171 5.2.4 MICO Orchestration Service - The MICO Bro- ker 173 5.3 From a Multimedia Metadata Workflow to a Self-Sustaining Metadata Cycle 174 5.4 An Extension of the Workflow Environment 179 contents xi 5.5 Multimedia Metadata Application Conclusion 184 iv experiments and evaluations 187 6 experiments and evaluations 189 6.1 Related Work - Overall ORdfM Evaluations 190 6.1.1 Quasthoff: General ORdfM Comparison 191 6.1.2 Quasthoff: ORdfM Implementation and Run- time Evaluations 192 6.2 Anno4j and Ontology Structure Experiments 196 6.2.1 Runtime Experiments with Generated Anno4j Domain Models 197 6.2.2 Runtime Experiments with Pre-Created Proxy Classes 205 6.2.3 Runtime Experiments with Generated Anno4j Domain Models and Multiple Varying Parame- ters 209 6.2.4 Conclusion of the Anno4j Evaluations 217 6.3 ViSIT - A Cultural Heritage Use Case for the ORdfM Library Anno4j 218 6.4 Recapitulation of Posed Research Questions 226 v summary and conclusion 233 7 résumé 235 7.1 Conclusion 235 7.2 Future Work and Outlook 239 vi appendix 241 a appendix 243 a.1 The MICO Project 243 a.1.1 Background of the MICO Project 243 a.1.2 Development Cycle of the MICO Broker 245 a.2 Further Informations and Appended Data 247 bibliography 285 LISTOFFIGURES Figure 1 The Role of Semantic Web Users that Partic- ipated in the Survey of Cardoso [41] (From Which the Numbers and the Table Are Adopted). 8 Figure 2 Visualisation of the Semantic Gap Problem. 11 Figure 3 The Evolution of a Web of Documents to a Web of Data. 23 Figure 4 The Evolution of the Semantic Web. 27 Figure 5 Interaction Between the Concepts Around the Semantic Web. 28 Figure 6 Two Simple Statements Presented as Graph. 31 Figure 7 Combined Statements Presented as Graph. 32 Figure 8 Exemplary Blank Node. 35 Figure 9 Example RDF Dataset with Two Named Graphs “cities:NewYork” and “cities:Chicago”. The Exemplary Namespace “http://examplecities.- com/” is Assumed for the Prefix “cities:”. 41 Figure 10 Exemplary RDF Graph Used for Showing the Different RDF Serialisation Formats. 43 Figure 11 Exemplary Web Annotation Illustrating the In- formation That the Human Face Displayed on a Given Picture is Barack Obama. 66 Figure 12 Exemplary Web Annotation from Figure 11 with more Metadata Information, Especially the Ad- dition of the Datatypes Text and Image.

Embedding a Multimedia Metadata Model Into a Workflow-Driven Environment Using Idiomatic Semantic Web Technologies

Domain Landscape Review and Data Value Chain Definition

Qualifikationsprofil #10309

Integration of Web Apis and Linked Data Using SPARQL Micro-Services

EAGLE EAGLE OER Learning Platform

Linked Data Architectural Components How-To Attach Linked Data Services to Legacy Infrastructure?

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"

Smart Campus Based on Linked Data Phd Thesis

Testsmelldescriber Enabling Developers’ Awareness on Test Quality with Test Smell Summaries

Waternomics Deliverable Template

RDF Triplestores and SPARQL Endpoints

Linked Data Notifications and Activitypub Client and Server Student: Bc

A Geosparql Compliance Benchmark Which Aims to Measure the Extent to Which an RDF Triplestore Complies with the Requirements Speciﬁed in the Geosparql Standard