Vocabularies and Semantics Scenario Engineering Report GEOSS Architecture Implementation Pilot, Phase 3
Total Page:16
File Type:pdf, Size:1020Kb
Vocabularies and Semantics Scenario Engineering Report GEOSS Architecture Implementation Pilot, Phase 3 Version 1.5 - FINAL Content developed by the GEO Architecture Implementation Pilot Licensed under a Creative Commons Attribution 3.0 License GEO Architecture Implementation Pilot, Phase 23 Version: 1.5 Vocabularies and Semantics Scenario Engineering Report Date: 01/02/2011 Revision History Version Date Editor and Comments Content providers 1.0 12/11/2010 Cristiano Fugazza Initial draft 1.1 9/12/2010 Masahiko Nagai Contribution by the University of Tokyo 1.2 15/12/2010 Cristiano Fugazza Revision and harmonization 1.3 13/01/2011 Cristiano Fugazza Final version for revision by partners 1.4 30/01/2011 Will Pozzi Revision and integration of new content 1.5 01/02/2011 Cristiano Fugazza Finalization of report Document Contact Information If you have questions or comments regarding this document, you can contact: Name Organization Contact Information Cristiano Fugazza European Commission, JRC-IES [email protected] Masahiko Nagai The University of Tokyo, Japan [email protected] Stefano Nativi Istituto di Metodologie per l’Analisi [email protected] Ambientale, CNR Mattia Santoro Istituto di Metodologie per l’Analisi [email protected] Ambientale, CNR Will Pozzi GEO IGWCO [email protected] Page 2 GEO Architecture Implementation Pilot, Phase 23 Version: 1.5 Vocabularies and Semantics Scenario Engineering Report Date: 01/02/2011 Table of Contents 1. Introduction 4 1.1 Scope of this document 4 1.2 GEOSS AIP 4 1.3 Summary of SBA development 6 1.4 Future work 6 2. Community SBA Objectives 6 3. Scenario: Vocabularies and Semantics 8 3.1 Actors 8 3.2 Context and pre-conditions 8 3.3 Scenario Events 9 3.4 Post-Conditions 9 3.5 Special Requirements 9 4. System Model of the Scenario 10 5. Use Cases 10 5.1 AIP Transverse Use Cases 10 5.2 Specialized Use Cases 12 5.2.1 Semantic Mediation Use Case 12 5.2.2 Semantic Enabled Search 15 5.2.3 Semantic MediaWiki development 16 6. Implementation 18 6.1 Deployed Components 18 6.2 Interoperability Arrangements 18 6.3 Use of the GCI 18 6.4 Demonstrations 20 6.5 Future plans for deployment 20 7. References 20 Page 3 Vocabularies and Semantics Scenario 1.Introduction 1.1Scope of this document This document summarizes the outcome of the first exploratory use of semantics (the new integrative technologies being deployed over the world wide web) to provide more user-friendly, seamless integration across cross-cutting activities within the Group on Earth Observation (GEO) Societal Benefit Areas (SBAs). The Biodiversity and Drought thematic areas were directly addressed. The Group on Earth Observations System of Systems (GEOSS) is a multinational endeavor including members with hundreds of different languages. Semantics offers capabilities to preserve the underlying concepts—such as drought—within all these languages. Semantics also provides the capability to automate the process of finding the exact information that the user needs. The semantic activities within GEO are, partly, the activities being undertaken to develop the Semantic Network Dictionary and associated gazetteer activity (see below) and, partly, the semantic enrichment of the Discovery Augmentation Component (GI- DAC) of the EuroGEOSS discovery broker. The water (and developing biodiversity) ontology is bridging both of these efforts. This semantics activity is critical, because GEOSS is a global system, involving scientific (hydrologic) variables collected within multiple countries having multiple languages, as well as geographic place names within multiple languages for the same place. The GEO semantic activity includes the development of the semantic network dictionary to bridge multiple scientific disciplines. The counterpart Joint Research Center- Italian National Research Council semantic development occurs within the existing framework of selection of the drought, biodiversity, and forestry focal areas by the European Community. The semantic network dictionary is proposed as a trans-disciplinary dictionary. 1.1.1GEOSS Architecture and Data Semantic Activities The Data Integration and Analysis System (DIAS) is Japan’s contribution to GEO, including development of an ontology registry system, a data model registry, and a gazetteer. As noted in the AIP-3 Drought and Water Working Group Engineering Report, there are land surface models, distributed hydrologic models, and application software developed out of satellite-based data sources that generate water budget data. Also, there are geospatial map layers and variables that are stored within Geographic Information Systems. The ontology provides a means to organize data within the water cycle domain. This “division” is recognized as the distinction between: 1) the lexicographic ontology and 2) the geographic ontology. On the one hand, the lexicographic ontology organizes the data stored in multiple national hydrometeorological ministry databases, such as the World Meteorological Organization (WMO) telecommunications network, the European Community Medium Range Weather Forecasting Center (ECMRWF), and the US National Oceanic and Atmospheric Administration (NOAA) Global Forecasting System (GFS). On the other, it organizes data which are processed in order to map for the presence of drought. The geographic ontology is falling within the category of gazetteer services: place name directories that record triples comprising: i) multiple place names, ii) geographic “footprints” (i.e., locations), and iii) representations of real world geographic entities. The place name is a handle to support communication. One key requirement may be to have each place uniquely identified in Resource Description Framework (RDF) data using a Uniform Resource Identifier (URI). Semantic enablement of gazetteers relies on Linked Data, spread across gazetteers, instead of single, isolated silo-like gazetteers, having an RDF-based Linked Data interface which can be converted into RDF triples more readily. Besides the geographic semantic-enriched gazetteer province, there are the lexicographic, domain ontologies that are characterizing each discipline. An ontology arranges these terms with respect to one another using “class relations” such as those of object-oriented programming languages (see Figure 1). The “semantic network” definitions (concepts within the scientific teminology of hydrology and the names utilized within database schemata) are represented as “nodes” while the the relationships between terms are represented as “links”. The structure of such a dictionary is a simple network among terms; so browsing the dictionary resembles activating a hyperlink in a web browser and a web API (Application Program Interface). Also, it is easy to export it in XML and RDF formats. For example, “water” is the overlying concept which is divided up into the subclass processes of “evapotranspiration,” “precipitation”, and “runoff”. Expressed differently in semantics terminology, “evapotranspiration” is a part of “water”, etc. Why bother with semantics at all? Semantic-enrichment of the search and discovery process is intended to improve the accuracy of returned “hits” to queries but recourse to semantics is not limited to search and discovery. As an example, the usage of semantics can be extended to application components and modules for orchestration into workflows over a framework. In other words, semantics can play a role in the retrieval (search and discovery) process within GEOSS, as well as within the design of decision support services and tools within each of the Societal Benefit Areas (SBAs). Figure 1 – GEOSS Data Semantics Activities Figure 2 – Ontology Registry System 1.1.2Ontology Registry System The ontology registry is a comprehensive, authoritative reference for information about data models, data specifications, data definitions, and their relation to observation data, as shown in Figure 2. The ontology registry supports the creation and implementation of data models that are designed to encourage the efficient sharing of observation data. 1.2GEOSS AIP The GEOSS Architecture Implementation Pilot (AIP) task develops process and infrastructure components for the GCI and the broader GEOSS architecture as a means of coordinating cross-disciplinary interoperability deployment. The AIP Task provides phased delivery of components to GEOSS operations, with each phase consisting of: architecture refinement based on user interactions; component deployment and interoperability testing; and SBA- focused demonstrations. This Engineering Report (ER) is a key result of the third phase of AIP. AIP-3 was conducted from January 2010 to December 2010. A separate ER describes the overall process and results of AIP-3 and thereby provides a context for this Community SBA ER.1 1.3Summary of SBA development The development of the Vocabularies and Semantics Scenario in AIP-3 allowed for extending the capabilities of the GI-CAT [1] service broker developed in the context of AIP-2 in order to ground multilingual, semantics-aware queries without requiring any modification to the protocols and data formats employed by resource catalogs in the SDI domain. The transparency of the process suggests that it is possible to leverage on state-of-the-art techniques for providing advanced discovery functionalities. The development of these functionalities involved the structuring of queries issued to the GENESIS Vocabulary Service [8], a repository of SDI-related thesauri expressed in the SKOS format. Achieving this