Ontology-Driven Semantic Data Integration in Open Environment
Total Page:16
File Type:pdf, Size:1020Kb
Western University Scholarship@Western Electronic Thesis and Dissertation Repository 8-17-2020 1:00 PM Ontology-Driven Semantic Data Integration in Open Environment Islam M. Ali, The University of Western Ontario Supervisor: McIsaac, Kenneth, The University of Western Ontario A thesis submitted in partial fulfillment of the equirr ements for the Doctor of Philosophy degree in Electrical and Computer Engineering © Islam M. Ali 2020 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Data Storage Systems Commons, and the Other Computer Engineering Commons Recommended Citation Ali, Islam M., "Ontology-Driven Semantic Data Integration in Open Environment" (2020). Electronic Thesis and Dissertation Repository. 7230. https://ir.lib.uwo.ca/etd/7230 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract Collaborative intelligence in the context of information management can be defined as "A shared intelligence that results from the collaboration between various information systems". In open environments, these collaborating information systems can be heterogeneous, dynamic and loosely-coupled. Information systems in open environment can also possess a certain degree of autonomy. The integration of data residing in various heterogeneous information systems is essential in order to drive the intelligence efficiently and accurately. Because of the heterogeneous, loosely-coupled, and dynamic nature of open environment, the integration between these information systems in the data level is not efficient. Several approaches and models have been proposed in order to perform the task of data integration. Many of the existing approaches for data integration are designed for closed environment, tightly-coupled systems and enterprise data integration. They make explicit, or implicit, assumptions about the semantic structure of the data. Because of the heterogeneous and loosely-coupled nature of open environment, such assumptions are deemed unintuitive. Data integration approaches based on model that are extensional in nature are also inadequate for open environment. This is because they do not account for the dynamic nature of open environment. The need for an adequate model for describing data integration systems in open environment is quite evident. Intensional based modeling is found to be an adequate and natural choice for modeling in open environment. This is because it addresses the dynamic and loosely-coupled nature of open environment. In this work, an intensional model for the conceptualization is presented. This model is based on the theory of Properties Relations and Propositions (PRP). The proposed description takes the concepts, relations, and properties as primitive and as such, irreducible entities. The formal intensional account of both Ontology and Ontological Commitment are also proposed in light of the intensional model for conceptualization. An intensional model for ontology-driven mediated data integration in open environment is also proposed. The proposed model accounts for the dynamic nature of open environment and also intensionally describes the information of data sources. The interface between global and local ontologies and the formal intensional semantics of the query answering are then described. ii Keywords Collaborative Intelligence, Data Integration, Data Integration Systems, Open Environment, Conceptualization, Ontology, Ontological Commitment, Extensional Logic, Intensional Logic, Extensionalization, Intensional Model, Epistemic Logic, Intensional Epistemic Logic. Summary for Lay Audience In today’s world, data can be found anywhere, databases, web pages, email inboxes, and many more types of data sources. Some of these data sources are structured, i.e. they have tables and fields, like the case with databases. Other data sources are unstructured. This is the case with information that reside on a webpage or in your email inbox. This means that these data sources are heterogeneous. Another factor that affects the heterogeneity is the fact that, even the structured data sources are created by different parties. These various parties created their data sources with different needs in mind. And so, they tailored the data source to satisfy these particular needs. When it comes to generating intelligence for the purpose of driving decision making, one should attempt to take advantage of all available data sources. For example, it has been found that most of the information about customer satisfaction/frustration with a business can’t be found in an enterprise database. Rather, most of this information is on web pages, blogs, forums, or in the email inbox of a customer care representative. Nowadays also the communication on the web is very dynamic. Agents, computers, phones, servers, and other equipments can connect/disconnect from the web at anytime. This is an example for what we refer to as an open environment. In open environment agent can enter and leave the environment at anytime and the environment should still continue to function. As mentioned earlier, in order to generate intelligence, one should attempt to utilize the data from various data sources. In order to do so, the data from the various data sources need to be aligned and combined somehow. This can be referred to as data integration. In this work, we propose a model for data integration that accounts for the characteristics of what is referred to earlier as open environment. iii Acknowledgments I want to express my deep gratitude to my advisor Prof. Kenneth McIsaac for his continues and endless support for several years. Prof. McIsaac’s fruitful and constructive comments have been very helpful and added noticeable value to my work. I am also grateful to Anh Brown, Joan Aldis, Dr. Wendy Dickinson, and Jennifer Meister for helping me navigate through the university. I also offer many thanks to my friend and brother Dr. Ahmed Alassuity for his compassion and perseverance and his trust in my work. I also want to thank my loving wife Safae Malki who has been my companion and a driving force behind my success. I am also grateful to my father, Mohamed Abdelsalam who gave me all the love, care, and confidence that I needed to complete this work. And Finally, I want to thank my late mother Nazlah Abu Zaid who never stopped giving until she died. She had one wish before she died, she wanted to attend my graduation commencement. I ask God to deliver the news of my graduation to her in Paradise. iv Table of Contents Abstract ............................................................................................................................... ii Summary for Lay Audience ............................................................................................... iii Acknowledgments.............................................................................................................. iv Table of Contents ................................................................................................................ v List of Tables ................................................................................................................... viii List of Figures .................................................................................................................... ix Chapter 1 ............................................................................................................................. 1 1 Introduction .................................................................................................................... 1 1.1 Collaborative Semantic Intelligence ....................................................................... 1 1.2 Ontology-Based Data Integration ........................................................................... 3 1.3 Formal Treatment of Conceptualization and Ontology .......................................... 4 1.4 Ontology Mapping .................................................................................................. 5 1.5 Research Issues and Objectives .............................................................................. 6 1.5.1 Formal modeling of Conceptualization and Ontology................................ 6 1.5.2 Surveying the Ontology Matching Algorithms........................................... 6 1.5.3 Modeling the Semantic Data Integration Framework in Open Environment ..................................................................................................................... 6 1.5.4 Addressing the Dynamic and Loosely-Coupled Nature of Open Environment ................................................................................................ 7 1.6 Thesis Organization ................................................................................................ 7 Chapter 2 ............................................................................................................................. 8 2 Literature Review ........................................................................................................... 8 2.1 Data Integration ...................................................................................................... 8 2.1.1 Federated Data Integration ........................................................................ 10 2.1.2 Mediator-Based