Ontology Pattern-Based Data Integration
Total Page:16
File Type:pdf, Size:1020Kb
Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2015 Ontology Pattern-Based Data Integration Adila Alfa Krisnadhi Wright State University Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all Part of the Computer Engineering Commons, and the Computer Sciences Commons Repository Citation Krisnadhi, Adila Alfa, "Ontology Pattern-Based Data Integration" (2015). Browse all Theses and Dissertations. 1632. https://corescholar.libraries.wright.edu/etd_all/1632 This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. Ontology Pattern-Based Data Integration A dissertation submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy By ADILA ALFA KRISNADHI S.Kom., Universitas Indonesia, 2002 M.Sc., Technische Universit¨atDresden, 2007 2015 Wright State University Dayton, Ohio, USA WRIGHT STATE UNIVERSITY SCHOOL OF GRADUATE STUDIES December 18, 2015 I HEREBY RECOMMEND THAT THE DISSERTATION REPARED UNDER MY SUPERVISION BY Adila Alfa Krisnadhi ENTITLED Ontology Pattern-Based Data Integration BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy. Pascal Hitzler, Ph.D. Dissertation Director Michael L. Raymer, Ph.D. Director, Computer Science and Engi- neering PhD Program Robert E.W. Fyffe, Ph.D. Vice President of University Research and Dean of Graduate School Committee on Final Examination Pascal Hitzler, Ph.D Krzysztof Janowicz, Ph.D Krishnaprasad Thirunarayan, Ph.D Michelle A. Cheatham, Ph.D. ABSTRACT Krisnadhi, Adila Alfa. Ph.D. Department of Computer Science and Engineering, Wright State University, 2015. Ontology Pattern-Based Data Integration. Data integration is concerned with providing a unified access to data residing at multiple sources. Such a unified access is realized by having a global schema and a set of mappings between the global schema and the local schemas of each data source, which specify how user queries at the global schema can be translated into queries at the local schemas. Data sources are typically developed and maintained independently, and thus, highly heterogeneous. This causes difficulties in integration because of the lack of interoperability in the aspect of architecture, data format, as well as syntax and semantics of the data. This dissertation represents a study on how small, self-contained ontologies, called ontology design patterns, can be employed to provide semantic interoperability in a cross-repository data integration system. The idea of this so-called ontology pattern- based data integration is that a collection of ontology design patterns can act as the global schema that still contains sufficient semantics, but is also flexible and simple enough to be used by linked data providers. On the one side, this differs from existing ontology-based solutions, which are based on large, monolithic ontologies that provide very rich semantics, but enforce too restrictive ontological choices, hence are shunned by many data providers. On the other side, this also differs from the purely linked data based solutions, which do offer simplicity and flexibility in data publishing, but too little in terms of semantic interoperability. We demonstrate the feasibility of this idea through the actual development of a large scale data integration project involving seven ocean science data repositories ii from five institutions in the U.S. In addition, we make two contributions as part of this dissertation work, which also play crucial roles in the aforementioned data integration project. First, we develop a collection of more than a dozen ontology design patterns that capture the key notions in the ocean science occurring in the participating data repositories. These patterns contain axiomatization of the key notions and were developed with an intensive involvement from the domain experts. Modeling of the patterns was done in a systematic workflow to ensure modularity, reusability, and flexibility of the whole pattern collection. Second, we propose the so-called pattern views that allow data providers to publish their data in very simple intermediate schema and show that they can greatly assist data providers to publish their data without requiring a thorough understanding of the axiomatization of the patterns. iii Contents 1 Introduction1 1.1 Semantic Web and Data Integration................... 1 1.2 Research Questions............................ 6 1.3 Dissertation Overview .......................... 10 2 Semantic Web and Ontology Languages 12 2.1 The Web.................................. 12 2.2 Semantic Web............................... 15 2.2.1 Data Interchange Layer...................... 16 2.2.2 Query Language: SPARQL ................... 19 2.3 Ontology Languages ........................... 20 2.3.1 RDF Schema (RDFS) ...................... 21 2.3.2 The Web Ontology Language (OWL).............. 24 2.3.3 Datalog .............................. 31 3 Data Integration 34 3.1 Data Integration in Databases...................... 34 3.1.1 Formal Definition of Data Integration.............. 34 3.1.2 Architecture of Data Integration Systems............ 37 3.2 Ontology-based Data Integration .................... 40 3.3 Linked Data Integration ......................... 47 4 Ontology Design Patterns for Data Integration 50 iv 4.1 Ontology Design Patterns ........................ 50 4.1.1 General Definition ........................ 50 4.1.2 Content Patterns......................... 54 4.2 Collection of Content Patterns for Global Schema........... 56 4.3 Application Context: Oceanography Data Integration......... 59 4.4 Collaborative Modeling Approach.................... 60 4.4.1 The Modeling Workflow ..................... 60 4.4.2 Graphical Notation........................ 63 4.5 Selected Modeling Details ........................ 64 4.5.1 Person............................... 65 4.5.2 Agent Role ............................ 67 4.5.3 Cruise............................... 69 4.6 Discussion................................. 81 5 Pattern Views 86 5.1 Introduction................................ 86 5.2 Consumer View.............................. 89 5.3 Producer View .............................. 96 5.4 Expressing the Mapping in OWL and SPARQL . 100 5.5 Views for GeoLink Patterns ....................... 103 5.6 Discussion................................. 105 6 Evaluation 107 6.1 Qualitative Evaluation: Rationale.................... 107 6.2 Data Collection Procedures ....................... 108 6.3 Questions ................................. 109 6.4 Participants' Responses.......................... 111 6.5 Findings and Discussion ......................... 116 v 7 Conclusion 120 7.1 Summary ................................. 120 7.1.1 Ontology Pattern-based Data Integration Framework . 120 7.1.2 Content Patterns for Ocean Science . 121 7.1.3 Pattern Views........................... 122 7.2 Future Work................................ 123 A GeoLink Pattern Collection 125 A.1 Agent ................................... 126 A.1.1 Description ............................ 126 A.1.2 Axiomatization .......................... 126 A.1.3 Alignment Axioms ........................ 126 A.2 Agent Role ................................ 128 A.2.1 Description ............................ 128 A.2.2 Axiomatization .......................... 128 A.2.3 Alignment............................. 129 A.3 Event ................................... 132 A.3.1 Description ............................ 132 A.3.2 Axiomatization .......................... 133 A.3.3 Alignment............................. 135 A.4 Person................................... 138 A.4.1 Description ............................ 138 A.4.2 Axiomatization .......................... 138 A.4.3 Alignment............................. 139 A.5 Personal Info Item ............................ 141 A.5.1 Description ............................ 141 A.5.2 Axiomatization .......................... 142 vi A.5.3 Alignment............................. 143 A.6 Person Name ............................... 145 A.6.1 Description ............................ 145 A.6.2 Axiomatization .......................... 145 A.6.3 Alignment............................. 147 A.7 Identifier.................................. 149 A.7.1 Description ............................ 149 A.7.2 Axiomatization .......................... 149 A.8 Information Object............................ 151 A.8.1 Description ............................ 151 A.8.2 Axiomatization .......................... 152 A.8.3 Alignment............................. 154 A.9 Organization ............................... 155 A.9.1 Description ............................ 155 A.9.2 Axiomatization .......................... 156 A.9.3 Alignment............................. 158 A.10 Funding Award.............................. 162 A.10.1 Description ............................ 162 A.10.2 Axiomatization.......................... 163 A.10.3 Alignment............................. 165 A.11 Program.................................. 168 A.11.1 Description ............................ 168 A.11.2 Axiomatization.......................... 169 A.11.3 Alignment............................. 170 A.12 Place.................................... 173 A.12.1 Description ............................ 173 A.12.2 Axiomatization.........................