Deep Semantics in the Geosciences: Semantic Building Blocks for a Complete Geoscience Infrastructure

Deep Semantics in the Geosciences: semantic building blocks for a complete geoscience infrastructure Brandon Whitehead,1,2 Mark Gahegan1 1Centre for eResearch 2 Institute of Earth Science and Engineering The University of Auckland, Private Bag 92019, Auckland, New Zealand {b.whitehead, m.gahegan}@auckland.ac.nz Abstract. In the geosciences, the semantic models, or ontologies, available are typically narrowly focused structures fit for single purpose use. In this paper we discuss why this might be, with the conclusion that it is not sufficient to use semantics simply to provide categorical labels for instances—because of the interpretive and uncertain nature of geoscience, researchers need to understand how a conclusion has been reached in order to have any confidence in adopting it. Thus ontologies must address the epistemological questions of how (and possibly why) something is ‘known’. We provide a longer justification for this argument, make a case for capturing and representing these deep semantics, provide examples in specific geoscience domains and briefly touch on a visualisation program called Alfred that we have developed to allow researchers to explore the different facets of ontology that can support them applying value judgements to the interpretation of geological entities. Keywords: geoscience, deep semantics, ontology-based information retrieval 1 Introduction From deep drilling programs and large-scale seismic surveys to satellite imagery and field excursions, geoscience observations have traditionally been expensive to capture. As such, many disciplines related to the geosciences have relied heavily on inferential methods, probability, and—most importantly—individual experience to help construct a continuous (or, more complete) description of what lies between two data values [1]. In recent years the technology behind environmental sensors and other data collection methods and systems have enabled a boom of sorts in the collection of raw, discrete and continuous geoscience data. As a consequence, the operational paradigm of many conventional geoscience domains, once considered data poor, now have more data than can be used efficiently, or even effectively. For example, according to Crompton [2], Chevron Energy Technology Corporation had over 6000 Terabytes of data, and derived products such as reports, and is rapidly expanding. This data deluge [3], while significant in its affect on capturing information related to complex earth science processes, has become a Pyrrhic victory for geoscientists from a computational perspective. The digital or electronic facilitation of science, also known as eScience [4] or eResearch, coupled with the science of data [5] is fast becoming an indispensable aspect of the process of Earth science [6–8]. There are exemplar projects such as OneGelogy1, which translates (interoperates) regional geologic maps in an effort to create a single map of the world at 1:1 million scale; as well as the Geosciences Network2 (GEON) which houses a vast array of datasets, workflows, and tools for shared or online data manipulation and characterisation. Further, the National Science Foundation (in the U.S.A.) has funded EarthCube3 which seeks to meld the perspectives of geoscientists and cyberscientists to create a framework for locating and interoperating disparate, heterogeneous information about the entire Earth as a comprehensive system. The major contributions that eScience can make is by providing ways to communicate the semantics, context, capabilities and provenance of the datasets, workflows, information and tools in order for researchers to have a firm understanding of the artefacts they are using, and how they are using them. In this paper, we illustrate how multiple, multi-faceted semantic models are coordinated under the linked data paradigm to better reflect how geoscience researchers situate concepts with their own knowledge structures in an effort to contextualise observations, phenomena and processes. We look to expose which semantic, or ontological, commitments are needed to glean how science artefacts relate to researchers, methods and products (as data, or via theory) in order to transfer what is known about a place, and how it is known, as a useful analog for geoscience discovery. We use an interactive computational environment, known as Alfred, to view disparate ontologies that carry pieces of this ‘knowledge soup’ [9] as facets, and expose the relationships for discovery of new knowledge. 2 Geoscience Background The geosciences are far from exact; the earth as a living laboratory provides plenty of challenges, not least to the task of representing and communicating semantics. While geoscientists are remarkable in their ability to utilise disparate knowledge in mathematics, physics, chemistry and biology to create meaning from observed phenomena, their theories are bound by the inherent problems associated with scale and place, cause and process, and system response [10]. The Earth’s phenomena are complex, they often exhibit statistically unique signatures with several stable states while mechanical, chemical and biologic processes work in tandem, or asynchronously. Due to these often contradictory complications it has also been suggested that the Earth sciences exemplify a "case study to understand the nature and limits of human reasoning, scientific or otherwise" [11]. Adding to the complexity, “Geologists reason via all manner of maps, outcrop interpretation, stratigraphic 1 http://www.onegeology.org/ 2 http://www.geongrid.org 3 http://earthcube.ning.com/ relationships, and hypothetical inferences as to causation” [12] and they do this simultaneously across geographic and temporal scales. In order to discern the categories and components of the Earth as a system, the geoscientist requires a trained eye, what anthropologists call “professional vision” [13], which often necessitates years of experience and mentoring. This contextualised view of the world uses a long view of time, and becomes adept at distinguishing infrequent catastrophic events from those more frequent via the feedback loops between processes and components [13]. However, these feedback loops are often not well understood due to the fragmented nature of geoscience observation and data. This has required the geoscience community of practice to develop the means by which their observations are understood. Most notably, instead of constructing a specific research question and testing it, geoscientists often use the method of ‘multiple working hypotheses’ [14] and work toward reducing what is not known, instead of working towards some axiomatic truth. Indeed, the ability to abstract earth processes to a rational metaphoric justification could be considered an art form. As such, geology is often referred to as an interpretive science [15]; where empirical evidence is not possible, a story often emerges. Interpreting meaning in the geosciences revolves heavily around the inherent allusion in hypothesis, methods, models, motivations, and often more importantly, experience. Understanding the knowledge any researcher creates requires understanding that person’s research methods and the rationale behind their decision processes, which requires the ability for knowledge components to change roles as one tries to demystify the scale in context and perceptions from which they are constrained. Often, what is determined to be a result is steeped in probability as a function of a desired resource. To date, the research and research tools used throughout geoscience domains are largely situational; capturing tightly coupled observations and computations which become disjointed when the view, filter, or purpose is altered, even slightly, to that which is more representative of an earth system science. 3 Semantic Modelling in the Geosciences As the previous section suggests, the semantic nature of geoscientific ideas, concepts, models, and knowledge is steeped in experiential subjectivity and often characterised by what can or cannot be directly observed, directly or indirectly inferred, and, in many cases, the goals of the research. As the Semantic Web [16] has gained traction and support, a subset of Earth science researchers have been intrigued by the possibility of standards, formal structure, and, ultimately, ontologies in geoscience domains, mainly because, as Sinha et al., have stated, “From a scientific perspective, making knowledge explicit and computable should sharpen scientific arguments and reveal gaps and weaknesses in logic, as well as to serve as a computable reflection of the state of current shared understanding” [17]. As evidenced by the dearth of semantic models, or ontologies, in the earth sciences [18], the often-conflicting ideals and knowledge schemas are proving to be significant hurdles for ontological engineers. Most of the semantic models in Earth science communities would be considered weak [19], lightweight (sometimes referred to as ‘informal’) [20, 21] or implicit [22]. These include taxonomies, or controlled vocabularies—like the American Geophysical Union’s (AGU) index of terms,4 glossaries [23], thesauri [24], or a typical data base schema. Conversely, semantic models created with the aspiration of eventuating to strong, heavyweight or formal ontologies are limited. In cases where published formal domain ontologies do exist [25], they are often not openly available within the community. One openly available ontology of note is the upper-level ontology SWEET: Semantic Web for Earth and

Deep Semantics in the Geosciences: Semantic Building Blocks for a Complete Geoscience Infrastructure

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support