72 Geoinformatics 2008—Data to Knowledge

Geosciences represented by natural language learning and formalization

Geoscience Geoscience consists of similar concept set concept model terms provides meaning leaning and 1. Dictionary formalization 2. Controlled terminology 3Geoscience query data Computer . Figure 1. Diagram showing ontology applications the architecture of the ontological geosciences. Figure1. Diagram showing thearchitecture of the ontological geosciences. a system of hierarchies of geoscience knowledge. The hier- A Volcano Erupts—Semantic Data Registration archy of the architecture is not limited within the geoscience and Integration concept model; it extends to the geoscience knowledge levels, including the levels of the knowledge base that can be used to By Peter Fox,1 A. Krishna Sinha,2 Deborah L. design the model or applications. Second, it is a system that 3,4 5 is self-improving through the collaborative efforts between McGuinness, Rob Raskin, and Abdelmounaam system users and computers. The computer network facili- Rezgui2 ties provide a convenient mechanism for communication and sharing knowledge among the system users. The users gain 1High Altitude Observatory, Earth and Sun Systems Laboratory, National knowledge from the system and improve it according to their Center for Atmospheric Research, Boulder, Colo. knowledge of geosciences. Some of the most important learn- 2Departmtent of Geosciences, Virginia Polytechnic Institute and State ing and improving processes take place between the hierarchi- University, Blacksburg, Va. cal levels. The users learn the knowledge patterns from the processes occurring in the upper layer and transform them 3Department of , Rensselaer Polytechnic Institute, Troy, into the lower-level model or applications. Last, the middle N.Y. layer of the architecture can be regarded as “dictionary-based geosciences,” which is that aspect of the geosciences that is 4McGuinness Associates, Latham, N.Y. only represented by the terminologies provided in the diction- 5 ary. The dictionary provides the meaning of terms for geo- Jet Propulsion Laboratory, Pasadena, Calif. science data in the databases as well; therefore, the ontological geosciences can be built on the basis of geoscience data via We present a progress report in a research effort (Seman- the dictionary. tically-Enabled Scientific Data Integration, or SESDI) into the In summary, ontological geosciences is a structured application of methods and technologies to the geoscience subdiscipline that is based on geoscience data and challenging problem of integrating heterogeneous volcanic is recognizable and operable on computer systems. In the cur- and atmospheric chemical-compound data, which are used rent environment of research and development, we regard it to assess the atmospheric effects of a volcanic eruption. One as being essentially a “human-being-centered” geoscience. It requirement to accomplish this is the semantic registration is the people who learn and generalize the knowledge pat- of datasets to domain and integrative ontologies. We demon- terns, who determine what geoscience content is appropriate strate how ontologies are implemented by leveraging existing for computers to represent, and who teach the computers to distributed semantic technology frameworks. understand the ontological geosciences that in turn support geoscientific research. Introduction Acknowledgments The goal of our project is to enable the next generation of interdisciplinary and discipline-specific data and information The author would like to thank China Geological Survey systems to answer many challenging science questions requir- for their support and financial contribution to this research ing data from widely distinct fields. Our initial focus was on project. the integration of volcanic and atmospheric data sources in Geoinformatics 2008—Data to Knowledge 73

Figure 1. Schematic diagram showing the packaging (that is, referencing and importing of ontologies) to make the necessary knowledge concepts and relations known to the application program. The concept-mapping tool CMAP was used to generate this figure. CMAP allows the embedding of more detailed concepts within a grouping (for example, the SemanticFilter and SWEETOntologyPackage). Abbreviations and symbols are as follows: SWEET, Semantic Web for Earth and Environmental Terminology; << (to the right of the boxes), allows the box to be “closed” and only display a single box with the higher level name; boxes can be expanded by clicking on a corresponding >> icon. support of investigations into relationships between volcanic gies in the form of modules from the Jet Propulsion Labora- activity and global climate (McGuinness and others, 2006, tory’s Semantic Web for Earth and Environmental Terminology 2007; Fox, McGuinness, and others, 2007; Fox, Sinha, and (SWEET), Virtual Solar-Terrestrial Observatory (VSTO), and others, 2007; Sinha and others, 2007). Another goal was to Geosciences Network (GEON). Our attention has been focused facilitate search and retrieval using an underlying framework on the “Atmosphere-Volcano Use Case,” whose goal is as fol- that contains information about the semantics of the scientific lows: To determine the statistical signatures of both volcanic terms that are used in the search. We also focused on the reg- and solar forcings on the height of the tropopause. istration of disciplinary datasets in order to fully facilitate the We convened small workshop groups along these topic integration of the volcanic and atmospheric data sources. We lines. We started with use cases and elements of the existing developed a tool to aid data providers with registering the data without explicitly knowing about the underlying ontologies. vocabularies or ontologies (where available) and proceeded to develop the knowledge representation. We used CmapTools, a concept mapping tool from Institute for Human and Machine Semantic Data Integration Methodology Cognition (IHMC, http://cmap.ihmc.us/coe) that reads and writes Ontology Web Language- (OWL-) based ontologies and We followed a methodology reported in previous work (Benedict and others, 2007) because our effort depended on provides OWL-based predicate assistance for adding relations machine-processable specifications of the scientific terms that between concepts. Figure 1 shows how the ontologies were are used in the study of volcanoes and the atmosphere. We packaged, indicating the direction of importing and package identified specific ontology modules that need construction in dependency (see figure caption for more details). the areas of volcanoes, plate tectonics, atmosphere, and climate, We leveraged the VSTO framework (Fox and others, 2006, which draw heavily on existing ontologies. We used ontolo- McGuinness and others, 2007) by replacing the solar-terrestrial- 74 Geoinformatics 2008—Data to Knowledge

Figure 2. The SEDRE software tool as shown on the computer. There are four main panes in the window: (1) the upper left shows key level-1 and level-2 concepts such as location and related for observations; (2) the lower left shows common compounds, the periodic table, and oxides (those that the user selects and associates with elements in the data table); (3) the upper right shows a preview of the data, where column headers are selectable so as to be associated with the concepts on the left two panes; and (4) the lower right is a display of the accumulated set of mapped relations (for example, SCD (Slant Column Density) is mapped to sulfur-dioxide as well as units, and so on). specific ontology and data sources with appropriate volcano and Geosphere+Geochemistry and Atmosphere+Atmospheric atmospheric ontologies and data and catalog sources. Chemistry. Our work required an even more modular approach to Figure 2 shows one phase of registration of sulfur-dioxide ontology re-use; the result was that we were able to conceive data from a level-2 swath product from the European satellite- a new conceptual decomposition starting with SWEET 1.2 mounted Scanning Imaging Absorption Spectrometer for (http://sweet.jpl.nasa.gov). This effort will lead to the next ver- Atmospheric Chartography (SCIAMACHY). A user opens a sion of the framework, which was based on broad community file and selects Atmosphere > Atmospheric Chemistry and the input and participation guided by the principle of re-use by screen in figure 2 is displayed. other applications. Discussion and Conclusion Data Registration We have presented the latest progress in an effort that uses modular ontologies to capture meanings of terms in dis- We base our data registration effort on the work tinct but related science domains with the goal of facilitating from GEON (http://www.geongrid.org) and VSTO research into relationships between the domains and re-use of (http://www.vsto.org). The data registration sensibly consists the modular ontologies. We have leveraged the existing start- of three levels: ing points for reference ontologies in atmospheric science and 1. Discovery of data resources, which requires registration partly developed ontologies for volcanoes and plate tectonics. through use of high-level index terms. We have held workshops to vet the ontologies among the mul- tiple communities and completed a series of ontology mapping 2. Discovery of item-level databases, which requires regis- and merging exercises to arrive at the current modularization. tration at data-type-level ontologies. The key element of data registration using developed ontolo- gies is a new capability within the scientific Semantic Web 3. Item-detail-level registration (required for semantic inte- community. The SEDRE tool we have developed is still evolv- gration). ing and we have only limited experience with user testing, but We developed a software application known as the to date, the results and feedback are encouraging. Semantically-Enabled Data Registration Engine (SEDRE) to implement the registrations. The tool is intended to be Acknowledgments used by people with a variety of skill levels. We are using an ontology for the registration workflow for SEDRE for levels This work is supported by the National Aeronautics 1, 2, and 3 and the two “disciplines (+ sub-disciplines)”: and Space Administration’s (NASA’s) Advancing Collabora- Geoinformatics 2008—Data to Knowledge 75 tive Connections for Earth System Science (ACCESS) and McGuinness, D.L., Fox, Peter, Cinquini, Luca, West, Pat- Advanced Information Systems Technology (AIST) programs. rick, Garcia, Jose, and Benedict, J.L., 2007, The Virtual Solar-Terrestrial Observatory—A deployed semantic web application case study for scientific research, in Cheetham, References Cited William, and Goker, Mehmet, eds., Proceedings of the Nineteenth Conference on Innovative Applications of Benedict, J.L., McGuinness, D.L., and Fox, Peter, 2007, A Artificial Intelligence, Vancouver, British Columbia, Semantic Web-based methodology for building conceptual Canada, July 22–26, 2007: Menlo Park, Calif., Associa- models of scientific information: Eos, Transactions of the tion for the Advancement of Artificial Intelligence Press, American Geophysical Union, v. 88, no. 52, Fall Meeting p. 1,730–1,737 Supplement, Abstract IN53A–0950, p. F774. McGuinness, D.L., Sinha, A.K., Fox, Peter, Raskin, Rob, Fox, Peter, McGuinness, D.L., Middleton, Don, Cinquini, Heiken, Grant, Barnes, Calvin, Wohletz, Ken, Venezky, Luca, Darnell, J.A., Garcia, Jose, West, Patrick, Benedict, Dina, and Lin, Kai, 2006, Towards a reference volcano James, and Solomon, Stan, 2006, Semantically-enabled ontology for semantic scientific data integration: Eos, large-scale science data repositories, in Cruz, Isabel, Transactions of the American Geophysical Union, v. 87, Decker, Stefan, Allemang, Dean, Proest, Chris, Schwabe, no. 36, Joint Assembly Supplement, Abstract IN42A–03, Daniel, Mika, Peter, Uschold, Mike, and Arroyo, Lora, eds., also available online at http://www.agu.org/. (Accessed The Semantic Web—ISWC 2006, Proceedings, Fifth Inter- August 21, 2008.) national Semantic Web Conference, Athens, Ga., November 5–9, 2006: Lecture Notes in Computer Science, v. 4273, McGuinness, D.L., Fox, Peter, Sinha, A.K., and Raskin, p. 792–805. Robert, 2007, Semantic integration of heterogeneous volcanic and atmospheric data, in Brady, S.R., Sinha, A.K., Fox, Peter, McGuinness, D.L., Raskin, Rob, and Sinha, and Gundersen, L.C., eds., Proceedings, Geoinformatics Krishna, 2007, A volcano erupts—Semantically mediated 2007—Data to Knowledge, San Diego, Calif., May 17–19, integration of heterogeneous volcanic and atmospheric data, 2007: U.S. Geological Survey Scientific Investigations in Proceedings of the ACM First Workshop on Cyberinfra- Report 2007–5199, p. 10–13. structure—Information Management in eScience, Lisbon, Portugal, November 9, 2007: New York, N.Y., Association Sinha, A.K., McGuinness, D.L., Fox, Peter, Raskin, Robert, for Computing Machinery, p. 1–6. Condie, Kent, Stern, Robert, Hanan, Barry, and Seber, Dogan, 2007, Towards a reference plate tectonics and Fox, Peter, Sinha, Krishna, Raskin, Rob, McGuinness, D.L., volcano ontology for semantic scientific data integration, in Ammann, Caspar, Venezky, Dina, and Schwander, Florian, Brady, S.R., Sinha, A.K., and Gundersen, L.C., eds., Pro- 2007, Semantic mediation and integration of volcanic and ceedings, Geoinformatics 2007—Data to Knowledge, San atmospheric data—In search of statistical signatures: Eos, Diego, Calif., May 17–19, 2007: U.S. Geological Survey Transactions of the American Geophysical Union, v. 88, Scientific Investigations Report 2007–5199, p. 43–46. no. 52, Fall Meeting Supplement, Abstract IN240A–05.