The Need for Ontologies: Bridging the Barriers of Terminology and Data Structure
Total Page:16
File Type:pdf, Size:1020Kb
spe482-10 page 1 The Geological Society of America Special Paper 482 2011 The need for ontologies: Bridging the barriers of terminology and data structure Leo Obrst* The MITRE Corporation, 7515 Colshire Drive, McLean, Virginia 22102-7508, USA Patrick Cassidy MICRA, Inc., 735 Belvidere Ave., Plainfield, New Jersey 07062, USA ABSTRACT This chapter describes the need for complex semantic models, i.e., ontologies of real-world categories, referents, and instances, to go beyond the barriers of terminol- ogy and data structures. Terms and local data structures are often tolerated in infor- mation technology because these are simpler, provide structures that humans can seemingly interpret easily and easily use for their databases and computer programs, and are locally salient. In particular, we focus on both the need for ontologies for data integration of databases, and the need for foundational ontologies to help address the issue of semantic interoperability. In other words, how do you span disparate domain ontologies, which themselves represent the real-world semantics of possibly multiple databases? We look at both sociocultural and geospatial domains and pro- vide rationale for using foundational and domain ontologies for complex applications. We also describe the use of the Common Semantic Model (COSMO) ontology here, which is based on lexical-conceptual primitives originating in natural language, but we also allow room for alternative choices of foundational ontologies. The emphasis throughout this chapter is on database issues and the use of ontologies specifically for semantic data integration and system interoperability. INTRODUCTION Nationwide, lack of semantic interoperability is estimated to have cost U.S. businesses over $100 billion per year in lost produc- Differing terminology and different database structures tivity (http://en.wikipedia.org/wiki/Semantic_interoperability). within various communities create serious barriers to the timely Such estimates do not consider loss of opportunity for timely transfer of information among communities. Barriers to accurate action due to slow information sharing, nor do they measure lost transfer of information between communities, also called seman- opportunities that would arise from existence of a standard of tic interoperability, can be so large that they present a practical meaning that would allow new, independently developed pro- impossibility, given the limited resources available for most tasks. grams to interoperate accurately. The effects of the lack of seman- tic interoperability on national security are difficult to measure, but they add to this economic inefficiency. *[email protected] Obrst, L., and Cassidy, P., 2011, The need for ontologies: Bridging the barriers of terminology and data structure, in Sinha, A.K., Arctur, D., Jackson, I., and Gundersen, L., eds., Societal Challenges and Geoinformatics: Geological Society of America Special Paper 482, p. 1–26, doi:10.1130/2011.2482(10). For permis- sion to copy, contact [email protected]. © 2011 The Geological Society of America. All rights reserved. 1 spe482-10 page 2 2 Obrst and Cassidy The difficulty is not in transferring bits and bytes of data but bership could indicate a significant link. If those two agencies in transferring information between separately developed systems share their data, neither program will recognize the significance in a form that can be correctly machine-interpreted and used for of the alternative way of specifying an organization’s member- automated inference. Optimally, the information to be transferred ships. The two organizations represent the relations in an inverse must be in a neutral form and independent of the original purpose fashion, so even a cross-reference table that maps columns of for gathering and recording the data. If recorded in such a form, it identical meaning would be unable to catch this relationship. The can be reused in an automated process by any system regardless meaning (semantics) of social relations are locked in the pro- of its origin. Computational ontologies, developed over the past cedural code of the programs that perform actions based on the 30 yr, provide the current state of the art for expressing meanings implied, but not explicit, meaning of the data tables. However, in such an application-neutral form. This report discusses why if both databases are properly mapped to an ontology, even sim- adoption of a common foundation ontology (FO) throughout an ple ontology formalisms, such as the Web Ontology Language enterprise can enable quick and accurate exchange of machine- (OWL) (Bechhofer et al., 2004), will automatically generate the interpretable information without modifying local users’ termi- inverses of the specified relations. Consequently, the attempt of nologies or practices, and the principle is briefly explored as it that potential terrorist trying to enter the country could be recog- relates to sociocultural and geospatial information. nized immediately and automatically, in spite of differences in the A foundation ontology as described here is an ontology that way information is encoded in different databases. represents the most basic and abstract concepts (classes, proper- Another situation may arise where one agency has a data- ties, relations, functions) that are required to logically describe base of social relations containing the field “CloseRelatives,” the concepts in any domain ontology that is mapped to it. The which lists all relatives with up to two links of family relations principles described here apply to sharing of information among (i.e., parent, uncle, aunt, brother-in-law, etc.), but another agency all domains, including geospatial information. may only list specific relations individually. In the ontology, the The need for an ontological approach can be better under- specific relations will be subrelations of the more general relation stood by examining the limits of some common approaches to (e.g., an uncle will also be recognized as a “close relation”). A interoperability. Commercial vendors have developed systems query for “CloseRelatives” on the database will return the more that reduce this inefficiency. The syntactic and structural pro- specific relatives from other databases mapped to the ontology, cesses are called “data integration” or “enterprise information using the inference mechanism of the ontology. Other examples integration.” Among the tactics proposed is the process of “data and scenarios illustrating the advantages of using ontologies for warehousing.” In this process, data from multiple databases are database integration are provided later in this paper. Appendix A extracted, transformed, and loaded into a new database that pro- will describe the similarities between ontologies and databases. vides an integrated view of the local enterprise. This tactic is Appendix B will describe the use of ontologies for federated que- practical only within a single enterprise that has a central direc- ries across multiple databases. Appendix C will provide an exam- tor that can enforce data sharing and adequately fund the inte- ple of a query translation from the ontology to multiple databases grated database. In addition, it typically does not allow real-time using that ontology. updates of the integrated database. Furthermore, there are no When different geospatial communities develop ontologies explicit semantics associated with this approach; the semantics or databases derived for different purposes, sharing their data remains implicit, individuals must inspect the structural schemas, will require the logical descriptions of geospatial entities as the appropriate data dictionaries, and the application procedural well as other concepts to be carefully specified. In order for code that surrounds access to or processing of the data from the logical inference to work consistently among different ontology- database in order to understand the intended semantics. based applications, the concepts, especially the relations, to be Here is an example. One agency may have a database that shared in common must be logically represented in terms with includes a table of terrorist organizations, with individual mem- an agreed-upon meaning, and in formats that, if not identical, bers related to each organization by the column (attribute) “mem- can be accurately translated. One common concept is that of ber” in the “terrorist organization” table. One record might specify “distance.” However, to date, there is little agreement even on a member of such an organization and the member’s position a common representation for units of measure, among which in the organization. The program using that database may also units of distance, time, and force (gravity, geomagnetism) will access lists of people trying to enter the United States, and it will be used frequently in geospatial applications. Note that there are immediately notify border agents to detain anyone who appears existing or emerging ontologies for units of measure, e.g., the in the “member” relation of a terrorist organization for question- emerging OASIS (Organization for the Advancement of Struc- ing. Another agency may have a database with a table of indi- tured Information Standards) Quantities and Units of Measure viduals considered “persons of interest.” A column “member-of” Ontology Standard (QUOMOS, 2011) Technical Committee, might specify the organizations to which an individual belongs. the National Aeronautics and Space Administration (NASA) The