The Need for Ontologies: Bridging the Barriers of Terminology and Data Structure

Total Page:16

File Type:pdf, Size:1020Kb

The Need for Ontologies: Bridging the Barriers of Terminology and Data Structure spe482-10 page 1 The Geological Society of America Special Paper 482 2011 The need for ontologies: Bridging the barriers of terminology and data structure Leo Obrst* The MITRE Corporation, 7515 Colshire Drive, McLean, Virginia 22102-7508, USA Patrick Cassidy MICRA, Inc., 735 Belvidere Ave., Plainfield, New Jersey 07062, USA ABSTRACT This chapter describes the need for complex semantic models, i.e., ontologies of real-world categories, referents, and instances, to go beyond the barriers of terminol- ogy and data structures. Terms and local data structures are often tolerated in infor- mation technology because these are simpler, provide structures that humans can seemingly interpret easily and easily use for their databases and computer programs, and are locally salient. In particular, we focus on both the need for ontologies for data integration of databases, and the need for foundational ontologies to help address the issue of semantic interoperability. In other words, how do you span disparate domain ontologies, which themselves represent the real-world semantics of possibly multiple databases? We look at both sociocultural and geospatial domains and pro- vide rationale for using foundational and domain ontologies for complex applications. We also describe the use of the Common Semantic Model (COSMO) ontology here, which is based on lexical-conceptual primitives originating in natural language, but we also allow room for alternative choices of foundational ontologies. The emphasis throughout this chapter is on database issues and the use of ontologies specifically for semantic data integration and system interoperability. INTRODUCTION Nationwide, lack of semantic interoperability is estimated to have cost U.S. businesses over $100 billion per year in lost produc- Differing terminology and different database structures tivity (http://en.wikipedia.org/wiki/Semantic_interoperability). within various communities create serious barriers to the timely Such estimates do not consider loss of opportunity for timely transfer of information among communities. Barriers to accurate action due to slow information sharing, nor do they measure lost transfer of information between communities, also called seman- opportunities that would arise from existence of a standard of tic interoperability, can be so large that they present a practical meaning that would allow new, independently developed pro- impossibility, given the limited resources available for most tasks. grams to interoperate accurately. The effects of the lack of seman- tic interoperability on national security are difficult to measure, but they add to this economic inefficiency. *[email protected] Obrst, L., and Cassidy, P., 2011, The need for ontologies: Bridging the barriers of terminology and data structure, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 1–26, doi:10.1130/2011.2482(10). For permis- sion to copy, contact [email protected]. © 2011 The Geological Society of America. All rights reserved. 1 spe482-10 page 2 2 Obrst and Cassidy The difficulty is not in transferring bits and bytes of data but bership could indicate a significant link. If those two agencies in transferring information between separately developed systems share their data, neither program will recognize the significance in a form that can be correctly machine-interpreted and used for of the alternative way of specifying an organization’s member- automated inference. Optimally, the information to be transferred ships. The two organizations represent the relations in an inverse must be in a neutral form and independent of the original purpose fashion, so even a cross-reference table that maps columns of for gathering and recording the data. If recorded in such a form, it identical meaning would be unable to catch this relationship. The can be reused in an automated process by any system regardless meaning (semantics) of social relations are locked in the pro- of its origin. Computational ontologies, developed over the past cedural code of the programs that perform actions based on the 30 yr, provide the current state of the art for expressing meanings implied, but not explicit, meaning of the data tables. However, in such an application-neutral form. This report discusses why if both databases are properly mapped to an ontology, even sim- adoption of a common foundation ontology (FO) throughout an ple ontology formalisms, such as the Web Ontology Language enterprise can enable quick and accurate exchange of machine- (OWL) (Bechhofer et al., 2004), will automatically generate the interpretable information without modifying local users’ termi- inverses of the specified relations. Consequently, the attempt of nologies or practices, and the principle is briefly explored as it that potential terrorist trying to enter the country could be recog- relates to sociocultural and geospatial information. nized immediately and automatically, in spite of differences in the A foundation ontology as described here is an ontology that way information is encoded in different databases. represents the most basic and abstract concepts (classes, proper- Another situation may arise where one agency has a data- ties, relations, functions) that are required to logically describe base of social relations containing the field “CloseRelatives,” the concepts in any domain ontology that is mapped to it. The which lists all relatives with up to two links of family relations principles described here apply to sharing of information among (i.e., parent, uncle, aunt, brother-in-law, etc.), but another agency all domains, including geospatial information. may only list specific relations individually. In the ontology, the The need for an ontological approach can be better under- specific relations will be subrelations of the more general relation stood by examining the limits of some common approaches to (e.g., an uncle will also be recognized as a “close relation”). A interoperability. Commercial vendors have developed systems query for “CloseRelatives” on the database will return the more that reduce this inefficiency. The syntactic and structural pro- specific relatives from other databases mapped to the ontology, cesses are called “data integration” or “enterprise information using the inference mechanism of the ontology. Other examples integration.” Among the tactics proposed is the process of “data and scenarios illustrating the advantages of using ontologies for warehousing.” In this process, data from multiple databases are database integration are provided later in this paper. Appendix A extracted, transformed, and loaded into a new database that pro- will describe the similarities between ontologies and databases. vides an integrated view of the local enterprise. This tactic is Appendix B will describe the use of ontologies for federated que- practical only within a single enterprise that has a central direc- ries across multiple databases. Appendix C will provide an exam- tor that can enforce data sharing and adequately fund the inte- ple of a query translation from the ontology to multiple databases grated database. In addition, it typically does not allow real-time using that ontology. updates of the integrated database. Furthermore, there are no When different geospatial communities develop ontologies explicit semantics associated with this approach; the semantics or databases derived for different purposes, sharing their data remains implicit, individuals must inspect the structural schemas, will require the logical descriptions of geospatial entities as the appropriate data dictionaries, and the application procedural well as other concepts to be carefully specified. In order for code that surrounds access to or processing of the data from the logical inference to work consistently among different ontology- database in order to understand the intended semantics. based applications, the concepts, especially the relations, to be Here is an example. One agency may have a database that shared in common must be logically represented in terms with includes a table of terrorist organizations, with individual mem- an agreed-upon meaning, and in formats that, if not identical, bers related to each organization by the column (attribute) “mem- can be accurately translated. One common concept is that of ber” in the “terrorist organization” table. One record might specify “distance.” However, to date, there is little agreement even on a member of such an organization and the member’s position a common representation for units of measure, among which in the organization. The program using that database may also units of distance, time, and force (gravity, geomagnetism) will access lists of people trying to enter the United States, and it will be used frequently in geospatial applications. Note that there are immediately notify border agents to detain anyone who appears existing or emerging ontologies for units of measure, e.g., the in the “member” relation of a terrorist organization for question- emerging OASIS (Organization for the Advancement of Struc- ing. Another agency may have a database with a table of indi- tured Information Standards) Quantities and Units of Measure viduals considered “persons of interest.” A column “member-of” Ontology Standard (QUOMOS, 2011) Technical Committee, might specify the organizations to which an individual belongs. the National Aeronautics and Space Administration (NASA) The
Recommended publications
  • Bridging the Gap Between User Generated Spatial Content and the Semantic Web
    Gianfranco Gliozzo Msc GIMA Master thesis Bridging the gap between user generated spatial content and the semantic web Bridging the gap between user generated spatial content and the semantic web Master of Science Thesis Gianfranco Gliozzo December 10 th 2010 Professor: Prof. Dr. M.J. (Menno-Jan) Kraak, Department of Geoinformation Processing Faculty of Geoinformation Science and Earth Observation University of Twente Supervisor: Dr. Ir. Rob Lemmens Department of Geoinformation Processing Faculty of Geo-Information Science and Earth Observation University of Twente Supervisor: Aldo Gangemi Senior Researcher Semantic Technology Lab (STLab) Institute for Cognitive Science and Technology, Italian National Research Council (ISTC-CNR) Reviewer: Mw. drs. M.E. (Marian) de Vries OTB Research Institute for the Built Environment Delft University of technology i Gianfranco Gliozzo Msc GIMA Master thesis Bridging the gap between user generated spatial content and the semantic web ii Gianfranco Gliozzo Msc GIMA Master thesis Bridging the gap between user generated spatial content and the semantic web Acknowledgments The present work could not have been completed without the loving support of my parents and my sister, old and new friends made on the occasion of the thesis. Let me quote my cousin, Alfio, always ready to stimulate and to suggest visions from a different perspective, he is the “guilty”, who suggested me the way of the semantic web and introduced me to Aldo Gangemi. Aldo Gangemi I will always be grateful for his generous support. The fantastic community of OpenStreetMap with in particular I want to thank simone (Simone Cortesi) for our long and pleasant conversations on OpenStreetMap story and perspectives and for his kind suggestions, tosky as well (Luigi Toscano) and David Paleino for their generous and tireless support in the struggle with open source applications and operating systems.
    [Show full text]
  • ABSTRACT an ADAPTATION METHODOLOGY for REUSING ONTOLOGIES by Vishal Bathija Ontologies Are an Emerging Means of Knowledge Repres
    ABSTRACT AN ADAPTATION METHODOLOGY FOR REUSING ONTOLOGIES By Vishal Bathija Ontologies are an emerging means of knowledge representation that can improve information organization and management in an application. They have demonstrated their value in numerous application areas such as intelligent information integration or information brokering since they offer the technical support for sharing and exchanging information between human and/or software agents. Despite their successes, their time- consuming and expensive development process deters the prevalent use of ontologies. Thus, research is focusing on ways to improve the process of ontology construction which involves recognizing, representing, and recording concept definitions and their relationships. This thesis research investigates existing methods for ontology learning and develops ontology adaptation software architecture for transforming an ontology from one domain to a related or similar domain. Using this software, the SEURAT’s Argument Ontology for the domain of software engineering is adapted to create an initial ontology that supports engineering design for the specific problem domain of spacecraft design. AN ADAPTATION METHODOLOGY FOR REUSING ONTOLOGIES A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Computer Science Department of Computer Science & Systems Analysis by Vishal Bathija Miami University Oxford, Ohio 2006 Advisor______________________________________ Dr. Valerie Cross Reader_______________________________________
    [Show full text]
  • Overview of Ontologies and Semantic
    Ontologies for the Intelligence Community (OIC) 2009 Tutorial: Information Semantics 101: Semantics, Semantic Models, Ontologies, Knowledge Representation, and the Semantic Web Dr. Leo Obrst Information Semantics Information Discovery & Understanding Command & Control Center MITRE [email protected] October 20, 2009 Copyright © Leo Obrst, MITRE, 2009 Overview • This course introduces Information Semantics, i.e., Semantics, Semantic Models, Ontologies, Knowledge Representation, and the Semantic Web • Presents the technologies, tools, methods of ontologies • Describes the Semantic Web and emerging standards Brief Definitions (which we‘ll revisit): • Information Semantics: Providing semantic representation for our systems, our data, our documents, our agents • Semantics: Meaning and the study of meaning • Semantic Models: The Ontology Spectrum: Taxonomy, Thesaurus, Conceptual Model, Logical Theory, the range of models in increasing order of semantic expressiveness • Ontology: An ontology defines the terms used to describe and represent an area of knowledge (subject matter) • Knowledge Representation: A sub-discipline of AI addressing how to represent human knowledge (conceptions of the world) and what to represent, so that the knowledge is usable by machines • Semantic Web: "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." - T. Berners-Lee, J. Hendler, and O. Lassila. 2001. The Semantic Web. In The Scientific American, May, 2001.
    [Show full text]
  • A Fuzzy Grassroots Ontology for Improving Weblog Extraction
    A Fuzzy Grassroots Ontology for improving Weblog Extraction Edy Portmann, Andreas Meier Information Systems Research Group Department of Informatics University of Fribourg Journal of Digital Boulevard de Pérolles 90 Information Management CH-1700 Fribourg Switzerland {edy.portmann, andreas.meier}@unifr.ch ABSTRACT: This paper presents fuzzy clustering algorithms to top-down-approach of ontologies, because fuzzy sets are establish a grassroots ontology – a machine-generated weak more suitable for characterizing vague information. Murthy ontology – based on folksonomies. Furthermore, it describes and Biswas state in [3] that fuzzy logic is an addition to a search engine for vaguely associated terms and aggregates conservative logic and handles the concept of partial truth them into several meaningful cluster categories, based on the along with true and false, which is used for qualitative rather introduced weak grassroots ontology. A potential application than quantitative judgement. Hence fuzzy logic follows the of this ontology, weblog extraction, is illustrated using a simple way humans think and helps to handle real world complexi- example. Added value and possible future studies are discussed ties more efficiently. Therefore, it converts imprecise human in the conclusion. information to precise mathematical models. Consider, for example, the following paradox of a marriage: I Categories and Subject Descriptors cannot live with her, and I cannot live without her. Both state- I.2.3 [Deduction and Theorem Proving]; Uncertainty, ``fuzzy,'' and ments are – to a certain degree – true. The dynamic between probabilistic reasoning: H.3.3 [Information Search and Retrieval]; those statements is – along with other things – what keeps Clustering marriage interesting. That is what fuzzy logic deals with.
    [Show full text]
  • Axiomatized Relationships Between Ontologies by Carmen Chui A
    Axiomatized Relationships Between Ontologies by Carmen Chui A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Mechanical & Industrial Engineering University of Toronto c Copyright 2013 by Carmen Chui Abstract Axiomatized Relationships Between Ontologies Carmen Chui Master of Applied Science Graduate Department of Mechanical & Industrial Engineering University of Toronto 2013 This work focuses on the axiomatized relationships between different ontologies of varying levels of expressivity. Motivated by experiences in the decomposition of first- order logic ontologies, we partially decompose the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) into modules. By leveraging automated reasoning tools to semi-automatically verify the modules, we provide an account of the meta-theoretic relationships found between DOLCE and other existing ontologies. As well, we examine the composition process required to determine relationships between DOLCE modules and the Process Specification Language (PSL) ontology. Then, we propose an ontology based on the semantically-weak Computer Integrated Manufacturing Open System Ar- chitecture (CIMOSA) framework by augmenting its constructs with terminology found in PSL. Finally, we attempt to map two semantically-weak product ontologies together to analyze the applications of ontology mappings in e-commerce. ii Acknowledgements I would like to thank Professor Michael Gr¨uninger for his support and guidance over the course of my undergraduate and graduate studies. It is through our various discussions and brainstorming sessions that the theme for this thesis arose. His passion and enthu- siasm to explain concepts found within the fields of ontologies and logic have guided me throughout the course of writing this thesis.
    [Show full text]
  • Download Download
    Vol.2 / No.1 January-June 2019 An Exploration of the Creative Work Ontology for the Ontology Aggregation of Social Media Edward C. K. Hung Pages 37 - 50 Received: August 6, 2018, Received in revised form: September 27, 2018, Accepted: October 16, 2018 ABSTRACT In view of the great success of social media, this paper raises an interesting question about how we should strengthen the existing social media so that we can have an even better communication means for business, education, e- government, etc. , as suggested by some scholars. This question further leads us to investigate the possibility of having a social media ontology to reveal the room for creativity and the fundamentals that must be preserved for these possible new designs. However, forging a social media ontology may require ontology aggregation of various ontologies of the popular social media platforms and therefore, trigger the data loss issue involved. This paper aims at providing a solution to this data loss issue, and proposes the use of the properties of the Creative Work Ontology for the ontology aggregation of social media. It also provides an example of the proposed solution regarding the partial ontologies of Facebook and Snapchat. KEYWORDS: Creative Work Ontology, ontology aggregation, social media Edward C. K. Hung, Ph. D. ([email protected]) is an Assistant Professor at Department of Journalism and Communication, Hong Kong Shue Yan University, Hong Kong 37 Communication and Media in Asia Pacific (CMAP) Introduction In view of the successful stories of Facebook, Instagram, Snapchat, Line and other social media platforms (Miles, 2014; Salman, 2018; Wojdynski & Evans, 2016), how should we study social media collectively so that we can have a better understanding of them and get ready to strengthen them in order to offer an even better communication means? Garimella and his colleagues (2018) claim that many research methods of social media are domain- specific.
    [Show full text]
  • Ontology Evaluation
    Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) von der Fakult¨atf¨urWirtschaftswissenschaften des Karlsruher Instituts f¨urTechnologie (KIT) vorgelegte Dissertation von Dipl.-Inform. Zdenko Vrandeˇci´c. Ontology Evaluation Denny Vrandeˇci´c Tag der m¨undlichen Pr¨ufung: 8. June 2010 Referent: Prof. Dr. Rudi Studer Erster Koreferent: Prof. James A. Hendler, PhD. Zweiter Koreferent: Prof. Dr. Christof Weinhardt Vorsitzende der Pr¨ufungskommission: Prof. Dr. Ute Werner This document was created on June 10, 2010 Koji su dozvolili da sanjam, koji su omogu´cilida krenem, koji su virovali da ´custi´ci, Vama. -2 Acknowledgements Why does it take years to write a Ph.D.-Thesis? Because of all the people who were at my side. TonˇciVrandeˇci´c,my father, who, through the stories told about him, made me dream that anything can be achieved. Perica Vrandeˇci´c,my mother, who showed me the truth of this dream. Rudi Studer, my Doktorvater, who is simply the best supervisor one can hope for. Martina N¨oth,for carefully watching that I actually write. Elena Simperl, for giving the most valuable advises and for kicking ass. Anupriya Ankolekar, for reminding me that it is just a thesis. Uta L¨osch, for significant help. Miriam Fernandez, for trans- lating obscure Spanish texts. Markus Kr¨otzsch, for an intractable amount of reasons. York Sure, for being a mentor who never restrained, but always helped. Sebastian Rudolph and Aldo Gangemi, for conceptualizing my conceptualization of conceptual- izations. Stephan Grimm for his autoepistemic understanding. Yaron Koren, for his support for SMW while Markus and I spent more time on our theses.
    [Show full text]