An Approach for Collaborative Ontology Development in Distributed and Heterogeneous Environments

An Approach for Collaborative Ontology Development in Distributed and Heterogeneous Environments Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn von Lavdim Halilaj aus Kosovo Bonn, 25.10.2018 Dieser Forschungsbericht wurde als Dissertation von der Mathematisch-Naturwissenschaftlichen Fakultät der Universität Bonn angenommen und ist auf dem Hochschulschriftenserver der ULB Bonn http://hss.ulb.uni-bonn.de/diss_online elektronisch publiziert. 1. Gutachter: Prof. Dr. Sören Auer 2. Gutachter: Prof. Dr. Jens Lehmann Tag der Promotion: 10.12.2018 Erscheinungsjahr: 2019 Abstract The era of digitalization poses high demands on capturing and processing knowledge generated in everyday life in formal models. Ontologies provide common means for formal knowledge capturing and modeling for a universe of discourse. Developing ontologies, however, can be a complex, time-consuming and expensive process which requires a significant amount of resource investments. Different stakeholders, such as ontology engineers, domain experts and ultimately users, are usually involved in the development process; they may be geographically distributed and work independently in isolated environments while typically have to synchronize their contributions. Supporting the entire development life-cycle of ontology modeling places a number of challenges. Stakeholders may have different working behaviors and practices that should be accommodated. Concurrent ontology modifications performed using various authoring editors have to be tracked, integrated and may result in synchronization conflicts. Further, ensuring ontology quality according to the domain requirements is another challenge to be tackled. In the past years, several methodologies and tools have been created to enable ontology development for a number of different purposes and applications. Albeit designed to cover a range of development aspects, existing approaches lack comprehensive support of the ontology life-cycle, in particular independent work in disparate environments. In this thesis, we tackle the problem of collaborative ontology development in distributed and heterogeneous environments, and present a stakeholder-oriented approach able to holistically assist the development of ontologies in diverse and independent scenarios. First, we define Git4Voc, a lightweight methodology comprising a set of guidelines and practices to be followed by stakeholders while modeling ontologies. We then conceive VoCol, a flexible and integrated development platform to address critical requirements from a technical perspective. Moreover, techniques for reducing the number of conflicts and allowing the efficient evaluation of test cases have been designed and implemented. VoCol can be adopted in numerous scenarios and accommodate additional tools in a well-designed and semi-automatic ontology development workflow. The benefits of this flexibility are two-fold: 1) stakeholders do not need to strictly follow a specific methodology; in contrary, they can organize their work in small and incremental development steps; and 2) consumers may provide their feedback, even though they are not directly part of the active development team. VoCol can be utilized to efficiently ensure quality ontologies with respect to the pre-defined requirements. We conducted several empirical evaluations to assess the effectiveness and efficiency of our holistic approach. More importantly, ontologies for various domains, such as Industry 4.0, life sciences and education, have been successfully developed and managed following our approach. The results from the empirical evaluations and concrete applications provide evidence that the methodology and techniques presented in this thesis comply with stakeholders’ needs and effectively support the entire ontology development life-cycle in distributed and heterogeneous environments. iii Contents I Preliminaries1 1 Introduction 3 1.1 Motivation.......................................4 1.2 Problem Definition and Challenges..........................5 1.3 Research Questions...................................8 1.4 Thesis Overview....................................9 1.4.1 Contributions..................................9 1.4.2 List of Publications.............................. 11 1.5 Thesis Structure.................................... 13 2 Background 15 2.1 Ontologies........................................ 15 2.1.1 The Resource Description Framework.................... 16 2.1.2 Expressiveness of Ontologies......................... 20 2.1.3 The SPARQL Protocol and RDF Query Language............. 21 2.1.4 The Semantic Web............................... 22 2.2 Ontology Development................................. 23 2.2.1 Collaborative Ontology Development..................... 25 2.2.2 Test-driven Development........................... 25 2.3 Version Control Systems................................ 25 2.3.1 Centralized Version Control Systems..................... 26 2.3.2 Distributed Version Control Systems..................... 27 3 Related Work 31 3.1 Methodologies for Collaborative Ontology Development.............. 31 3.1.1 Workflow-dependent Methodologies..................... 31 3.1.2 Workflow-independent Methodologies.................... 35 3.2 Platforms for Collaborative Ontology Development................. 38 3.2.1 Integrated Environments with own Version Control............. 39 3.2.2 Integrated Environments based on Generic Version Control Systems... 41 3.3 Conflict Prevention during Change Synchronization................. 43 3.4 Test-driven Approaches for Ontology Development................. 44 v II Collaboratively Developing Ontologies 47 4 Requirements for Collaborative Ontology Development 49 4.1 Method......................................... 50 4.1.1 Important Roles................................ 51 4.1.2 Analysis of Widely used Ontologies...................... 52 4.2 Requirements...................................... 54 4.2.1 Methodological Requirements......................... 55 4.2.2 Technical Requirements............................ 56 4.3 Summary........................................ 57 5 A Lightweight Methodology for Developing Ontologies in Distributed Environments 59 5.1 The Git4Voc Approach................................ 61 5.1.1 Governing Aspects............................... 62 5.1.2 Development Practices............................. 67 5.2 Evaluation........................................ 72 5.2.1 Schema.org Use Case.............................. 72 5.2.2 Survey and Result Discussion......................... 74 5.3 Summary........................................ 75 6 An Integrated Environment for Collaborative Ontology Development 77 6.1 The VoCol Approach.................................. 78 6.1.1 Contributor Workflow............................. 81 6.2 Implementation..................................... 81 6.2.1 Configuration.................................. 82 6.2.2 Client-side Tasks................................ 83 6.2.3 Server-side Tasks................................ 85 6.2.4 Deployment................................... 88 6.3 Evaluation........................................ 88 6.3.1 Industry Application.............................. 89 6.3.2 User Study................................... 89 6.4 Summary........................................ 91 III Quality Assurance for Ontology Development 93 7 Serialization Agnostic Ontology Development in Distributed Settings 95 7.1 Motivating Example.................................. 97 7.2 Problem Definition................................... 98 7.3 The SerVCS Approach................................. 99 7.4 Implementation..................................... 101 7.4.1 Version Control System............................ 101 7.4.2 UniSer...................................... 102 7.5 Empirical Evaluation.................................. 105 7.5.1 Impact of the Ontology Size.......................... 107 7.5.2 Impact of the Sorting Criteria......................... 107 7.6 Summary........................................ 108 vi 8 A Dependency-aware Approach for Test-driven Ontology Development 109 8.1 Motivating Example.................................. 110 8.2 Problem Definition................................... 111 8.3 The EffTE Approach.................................. 113 8.4 Implementation..................................... 116 8.4.1 Version Control System............................ 116 8.4.2 Integrated Validation Service......................... 116 8.5 Empirical Evaluation.................................. 117 8.5.1 Impact of the Ontology Size.......................... 119 φ 8.5.2 Impact of the Topology of TCGO ....................... 119 8.5.3 Impact of the Number of the Test Cases................... 120 8.5.4 Discussion.................................... 120 8.6 Summary........................................ 121 IV Applications and Conclusions 123 9 Establishing Semantic Interoperability between Industry 4.0 Models 125 9.1 A Semantic Administrative Shell for Industry 4.0 Components.......... 126 9.1.1 Background................................... 127 9.1.2 Challenges................................... 129 9.1.3 An RDF-based Approach for Semantifying I4.0 Components....... 129 9.1.4 Use Case.................................... 133 9.2 A Semantic Integration Perspective for Industry 4.0 Standards.......... 136 9.2.1 Background................................... 137 9.2.2 Methodology.................................. 137 9.2.3

Load more