A Volcano Erupts: Semantically Mediated Integration of Heterogeneous Volcanic and Atmospheric Data

A Volcano Erupts: Semantically Mediated Integration of Heterogeneous Volcanic and Atmospheric Data Peter Fox Deborah McGuinness High Altitude Observatory, Knowledge Systems & AI Lab, ESSL/NCAR Stanford University PO Box 3000 353 Serra Mall Boulder CO 80307-3000 Stanford, CA 94305 [email protected] McGuinness Associates 20 Peter Coutts Circle Stanford, CA 94305 Robert Raskin Krishna Sinha JPL/NASA Virginia Polytechnic Institute, Wilson Blvd., Pasadena, CA Department of Geology Blacksburg, VA ABSTRACT tologies implemented within existing distributed technology We present a research effort into the application of seman- frameworks are providing essential, re-useable, and robust, tic web methods and technologies to address the challeng- support necessary for interdisciplinary scientific research ac- ing problem of integrating heterogeneous volcanic and atmo- tivities. spheric data in support of assessing the atmospheric effects Categories and Subject Descriptors: H.2.5 Heteroge- of a volcanic eruption. neous Databases: Data translation, I.2.4 Knowledge Rep- This exemplary volcano eruption scenario highlights what resentation Formalisms and Methods: Frames and scripts; is true for the vast majority of data intensive Earth system Representation languages; Semantic networks investigations which have limited ability to explore impor- General Terms: Design tant and difficult problems. This is because they are forced Keywords: informatics, knowledge representation, ontolo- to find and use data representing an event or phemenonon of gies, semantic data integration, semantic mediation interest through data collections at the data-element, or syn- tactic, level rather than at a higher scientific, or semantic, level. Even if relevant data is found in one collection, it may 1. INTRODUCTION not be easy or possible to find similar, related data in an- When a volcano erupts, there is sequence of events and other collection. In many cases, syntax-only interoperability impacts that is diverse and complex. The characteristics of IS the state-of-the-art and at best, there are some instances an eruption; size, type and duration all influence the effect of hard-wired but simple semantic enhancements (e.g. a on the local, regional, and global environment. These effects special purpose web service wrapper around the data). Sci- range from diminished air quality, hazards for human health entists and non-scientists are forced to learn details of the and ground and air transportation to effects on atmospheric data schema, other people’s naming schemes and syntax de- composition and radiative blanketing. The contributions cisions and details of differing web site interfaces. These come from the smoke and ash, ejected gases, scattering and constraints are limiting even when researchers are looking numerous other processes. The location of the volcano (lati- for information in their own discipline, but they present even tude and longitude) as well as its tectonic setting on land or greater challenges when researchers are looking for informa- undersea also are factors. There are an increasing number tion spanning multiple disciplines, including some in which of online repositories of scientific data information related they are not extensively trained. The volcano eruption sce- to volcanoes, their present and past activity and both direct nario exemplifies many of these challenges. In this paper we and proxy measurements of the nature of their impact. present research progress on how semantic enablement for While numerous sources of monitoring and retrospective scientific data integration is achieved. We present how on- data are available which represent measurements of the above- mentioned quantities they are presently stored in heteroge- nous and highly distributed repositories. To realize the goal of integration of many of these diverse sources of data to Permission to make digital or hard copies of all or part of this work for address specific aspects of the volcano eruption scenarios personal or classroom use is granted without fee provided that copies are we need to address many factors concerning access to and not made or distributed for profit or commercial advantage and that copies interoperability of the online scientific data. bear this notice and the full citation on the first page. To copy otherwise, to This work is aimed at providing scientists with the op- republish, to post on servers or to redistribute to lists, requires prior specific tion of describing what they are looking for in terms that permission and/or a fee. CIMS’07, November 9, 2007, Lisboa, Portugal. are meaningful and natural to them, instead of in a syn- Copyright 2007 ACM 978-1-59593-831-2/07/0011 ...$5.00. tax that is not. The goal is not simply to facilitate search and retrieval, but also to provide an underlying framework that contains information about the semantics of the scientific terms used. These capabilities are expected to be used by scientists who want to do processing on the results of the integrated data, thus the system must provide access to how integration is done and what definitions it is using. The missing elements in previous systems in enabling the higher-level semantic interconnections is the technology of ontologies, ontology-equipped tools, semantically aware interfaces between science components, and explanations of knowledge provenance. We present the initial results of a project entitled: Semantically-Enabled Science Data Inte- gration (SESDI) [1] which uses semantic technologies to in- tegrate data between these two discipline areas to assist in establishing causal connections as well as exploring as yet unknown relationships. We use as starting points, many elements of semantic web methodologies and technologies which are based on our developments for the Virtual Solar-Terrestrial Observatory (VSTO) [2]. This work created a scalable environment for Figure 1: Schematic of the events and processes be- searching, integrating, and analyzing databases distributed ginning with a volcanic eruption and leading to the over the Internet required a high level of semantic interop- climate/atmospheric response. erability and has implemented a semantic data framework built on OWL-DL [3] ontologies, using the Pellet [4] reasoner experts in the areas. Prior to a workshop, we identify a within a Java-Tomcat servlet engine and made available via small set of subject matter experts. We also provide some a Spring-based web portal and SOAP/WSDL [5] web ser- background material for reading about ontology basics. Ad- vices. ditionally, prior to our face to face meetings with experts, We also have significant experience with ontology pack- we identify foundational terms in the discipline and provide ages and data registration from the Geosciences Network a simple starting point for organizing the basic terminol- (GEON) [6]. Our present ontology develop involved some ogy. While we do not want to influence the domain experts new developments as well as iterations and augmentations of on their terminology, we find that we make more progress the background domain ontology: Semantic Web for Earth if we provide simple starting points using well agreed upon and Environmental Terminology (SWEET) [7]. We lever- terminology. We then bring together a small group of the age the precise formal definitions of the terms in supporting chosen domain experts and science ontology experts with a semantic search and interoperability. goal of generating an initial ontology containing the terms and phrases typically used by these experts. We use our task 2. USE CASES of researching the impact of volcanoes and global climate to We have developed several underlying use cases [8] for the focus the discussions to help determine scope and level of volcano-atmosphere data integration and in this section we granularity. will present the first of these which addresses the signature We held a meeting with volcano experts and generated an of an eruption in the terrestrial lower atmosphere. This use initial ontology containing terms and phrases used to classify case is: “determine the statistical signatures of both volcanic volcanoes, volcanic activities, and eruption phenomena. We and solar forcings on the height of the tropopause”. use a graphical concept mapping tool [9] for capturing the This specific science template is motivated by the more terms and their relationships. general research direction of looking for indicators of the fall out of volcanic eruptions that may create changes in the 3.1 Volcanic and Plate Tectonic Semantics atmosphere. The statistical signatures are such indicators, Volcanoes can be classified by composition, tectonic set- and the tropopause is at the edge of the atmosphere so it is ting, environmental setting, eruption type, activity, geologic a sensitive signature area to examine. setting, and landform. The ontology currently contains up- A schematic of the use case is shown in Fig. 1 which indi- per level terms in these areas and is being expanded accord- cates some of the important terms, concepts, processes (and ing to the needs of the project and is being reviewed by eventually underlying data) we need to represent. additional domain experts. The initial focus is on gathering terms, putting them into a generalization hierarchy (using isa links in the diagram) and connecting the terms through 3. METHODOLOGY properties (using has links in the diagram) as well as sameas Our effort depends on machine processable specifications and ispartof. of the science terms that are used in the disciplines

Load more