The Role of Semantics in a Spatial Data Infrastructure

Total Page:16

File Type:pdf, Size:1020Kb

The Role of Semantics in a Spatial Data Infrastructure

Emmanuel Stefanakis Dr. Emmanuel Stefanakis is an Assistant Professor at Harokopio University of Athens in the Department of Geography. His main research interests include geographic information systems, spatial decision support systems, knowledge and database systems, spatial data infrastructures, cartography, geovisualization, and web technology.

Poulicos Prastacos Dr. Poulicos Prastacos is Director of Research in the Regional Analysis Division of the Institute of Applied and Computational Mathematics, FORTH. His main research interests include GIS applications for transportation, environment, regional planning; statistical, geographic and environmental databases; web GIS and applications on the internet; development of land use, transportation, and regional economic models; implementation of decision support systems.

SEMANTIC-BASED SPATIAL INFORMATION INFRASTRUCTURES: INTEGRATING DATA AND SERVICES INTO A SINGLE COLLECTION Emmanuel Stefanakis1 and Poulicos Prastacos2 1Department of Geography, Harokopio University, 70, El. Venizelou Ave., 17671 Athens, Greece, [email protected] 2Institute of Applied and Computational Mathematics, Foundation for Research and Technology, Vassilika Vouton, P.O. Box 1527, 71110 Heraklion, Greece, [email protected]

Abstract. Nowadays, special attention is given towards the development of effective spatial data infrastructures (SDI’s) at regional, national or international level. The main scope of such an infrastructure is to promote the accessibility and usability of geospatial data. In parallel, there is an urgent need to share geospatial services. This leads to the development of spatial information infrastructures or service-driven infrastructures. The paper provides an overview of the role of semantics in a spatial data infrastructure or service-driven infrastructure (we adopt a common acronym, i.e., SDI), and the alternative architectures of the middleware between the clients (users and applications) and the servers (data and services).

1 Keywords. Semantics, Interoperability, Spatial Data Infrastructures (SDI), Spatial Information Infrastructures.

1. INTRODUCTION Nowadays special attention is given towards the development of effective spatial data infrastructures (SDIs) at regional, national or international level (Williamson et al., 2004). SDIs are frameworks of policies, institutional arrangements, data, services, technologies, and people with a common scope, i.e., to promote the accessibility and usability of geospatial content (data and services).

The construction of an SDI presumes that the participating organizations have agreed on the adoption of common vocabularies, practices, standards, technical specifications and operational components (Nebert, 2004). Hence, an SDI is not a simple data set or repository. Instead, an SDI hosts: (a) geographic content (data and services); (b) sufficient description of this content (metadata); (c) effective methods to discover and evaluate this content (data catalogues); (d) tools to visualize the data (e.g., web mapping); and (e) services and software tools to support specific application domains.

An SDI architecture is usually conceptualized as a three-tier architecture (Evans, 2003). At the top layer (the client) reside the users and applications; at the bottom layer (the server) reside the geospatial data; and at the middle layer (the middleware) reside all services that assist the accessibility to the data repositories (Figure 1). Obviously, the middle layer is the one that facilitates the worldwide access to and the use of online geospatial data.

Users and Applications top layer

Discovery & retrieval services middle layer

Geospatial data bottom layer

Figure 1. The three-tier architecture of current SDI.

2 Currently, SDI paradigm focuses on the exchange of geospatial data; i.e., the middle layer supports the discovery and retrieval of data available at the bottom layer; and this is consistent to the term spatial data infrastructure.

On the other hand, it is widely recognized (e.g., Bernard et al., 2005), that future SDI research must focus more on the users’ needs and the design of appropriate geoinformation services and service architectures to satisfy these needs. In other words, the development of service-driven infrastructures; or spatial information infrastructures will enhance infrastructures capabilities.

This may be accomplished by enriching both the bottom and middle layers (Figure 2). The bottom layer will also accommodate geospatial services (i.e., analytical software tools) along with data. As regards to the middle layer, existing discovery and retrieval services will be extended to apply to both the data and services available at the bottom layer. Additionally, the middle layer will incorporate software tools and services that may be used by the top layer (users and applications) and hence enhance the whole SDI functionality.

In both architectures the middle layer plays the role of the middleware between the clients (users and applications) and the servers (data and services). Its role is supported by a set of mechanisms, as follows (Nebert, 2004): (a) sufficient descriptions of the geospatial content (both data and services), i.e., the metadata; (b) discovery of distributed collections of geospatial content through their metadata descriptions, i.e., the catalogue gateway; and (c) metaphors to discover and access the geospatial content, i.e., the interfaces.

Users and Applications top layer

Geoprocessing services middle layer

Geospatial content bottom layer

3 Figure 2. The three-tier architesture of future SDI.

All mechanisms above must be constructed in such a way that enable the effective sharing and use of geospatial content. The incorporation of rich semantics at all these components affects definitely the efficiency of the middleware (Nebert, 2004). Semantics enable the interoperability between different data collections and services. The scope of this paper is to highlight the presence of semantics into these components, and discuss the alternative architectures of the middleware.

In the sequel, we use the single acronym SDI to represent both spatial data and information infrastructures. Although, several other acronyms can be found in the literature (such as SII standing for spatial information infrastructures) we maintain the widely used acronym SDI, which can also stand for service-driven infrastructures.

The discussion is organized as follows. Sections 2 to 4 comment the role of semantics and related methods in the components of an SDI and how they assist the role of the middleware. Specifically, Section 2 focuses on metadata; Section 3 on the catalogue gateway; and Section 4 on the interfaces. Section 5 presents briefly the efforts and products of the organizations developing standards for SDIs. Then Section 6 shows the alternative architectures for SDIs. Finally, Section 7 concludes the discussion.

2. METADATA As regards to the metadata, they convey – by nature – the semantics of the SDI content (Nogueras-Iso et al. 2005). Metadata helps people who use geospatial content find the data and services they need and determine how best to use it. Ideally, the metadata structures and definitions should be referenced to a standard. Metadata standards (Nogueras-Iso et al., 2005) for geospatial content – although in use – are currently under continuous revision and testing.

Professional communities have developed metadata standards for geospatial content. These standards describe both the structure of data and services and the intended semantic content for specific data themes. Currently, the most prominent metadata standards are (Nebert, 2004): the Content Standard for Digital Geospatial Metadata – CSDGM (FGDC, 1998); and the ISO 19115 (ISO-TC211, 2003).

4 These standards define the schema for describing geographic information and services and provide information about the identification, the extent, the quality, the spatial and temporal schema, the spatial reference, and the distribution of digital geographic data. Additionally, both standards contain guidelines to develop metadata profiles, which allow their customization to specific community needs.

CSDGM has been adopted and implemented in the U.S., Canada, U.K. and other countries around the world. ISO 19115 has been expressed through its companion specification, ISO 19139, into eXtensible Markup Language (XML) and is currently used for the creation of a North American Profile of Metadata for the U.S., Canada, and Mexico. The same standard will also be used by the INfrastructure for SPatial Information in Europe (INSPIRE, 2004).

3. CATALOGUE GATEWAY As regards to the catalogue gateway, semantics play again a key role, provided that they assist and guide the discovery and evaluation of useful content from an SDI. The catalogue exploits the metadata of the multiple resources hosted in an SDI.

Obviously, catalogues can be achieved through the use of common descriptive vocabularies (metadata), a common discovery and retrieval protocol and a registration system for metadata collections (Nebert, 2004). At this point the issue of interoperability between metadata schemas emerges. According to the ISO, interoperability is defined as the capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have limited or no knowledge of the characteristics of those units.

The catalogue gateway handles multiple metadata collections, which are distributed and heterogeneous. The heterogeneity of the metadata collections may be considered as syntactic, schematic or semantic heterogeneity (Bishr, 1998). Syntactic heterogeneity refers to the differences in the logical data models (e.g., relational versus object-oriented) or in spatial representations (e.g., raster versus vector). Schematic heterogeneity refers to the differences in the conceptual data models (the data are organized in different schemas and structures). Semantic heterogeneity refers to the differences in meaning, interpretation or usage of the same or related data. Semantic heterogeneity is further divided into naming

5 (i.e., synonyms and homonyms) and cognitive (i.e., different conceptualizations) heterogeneity.

Semantic heterogeneity between metadata collections causes most integration problems. These problems refer to (Bernard et al., 2003): (a) metadata interpretation, and (b) data interpretation. As regards to metadata interpretation, different providers may use different terms. This makes difficult for both the humans and the computers to recognize the coherence between similar terms. As regards to data interpretation, the properties of the entities involved in an application domain are expressed based in various standardized (e.g., CORINE vocabulary for land cover classification; EAA, 2000) or not vocabularies. Data and metadata interpretation problems must be overcome when data from different sources have to be integrated into a single collection. This requires the adoption of common terms, which are usually provided by a standardized vocabulary with unchanged terms. All terms coming from other vocabularies will be translated in the adopted vocabulary using tools for semantic translation.

The use of ontologies (Fonseca and Egenhofer, 1999; Guarino, 1998; Wache, 2001) as semantic translators is a possible approach to overcome the problem of semantic heterogeneity (Bernard et al. 2003). Ontologies play a critical role in associating meaning with data such as computers can understand and process data automatically. Ontologies may be used (a) to assist the translation of a term (i.e., a concept) from one vocabulary (i.e., one ontology) into a term from another vocabulary; and (b) to derive super concept and sub-concept relationships.

However, currently ontologies are at the best able to define data semantics under very controlled situations. They cannot yet fully support semantic translation. They fail to cope with the semantics of services (Bernard et al., 2005). Nowadays GI community addresses the following three research issues (Fonseca and Sheth, 2003): (a) the creation and management of domain specific geo-ontologies for both data and services; (b) the matching geo-concepts available in the web to geo-ontologies; and (c) the integration of different ontologies as regards to both the geographic dimension and the non-geographic domain.

6 4. INTERFACES As regards to the interfaces, geospatial applications have special requirements at the user interface level. This is because the appropriate visualization of geospatial data may assist people to discover or evaluate the geospatial content (Adrienko and Adrienko, 1999; Kraak and Brown, 2000). Hence, the quality of geovisualization products must be promoted.

Semantics associated to geospatial content may be exploited at this stage and lead to effective geo-visualizations. Current research on the development of metaphors and GUI’s as well as on issues related to multiple representation and interoperability of geospatial data (e.g., ISRPS, 2006) may lead to enhanced visualizations.

Last, but not least, the services and software tools hosted by an SDI must be semantic based in order to be more efficient and assist advanced application domains. The development of semantic-based software for geospatial applications is an emerging field of research. Although many semantic-based tools have been developed in the past to assist geospatial applications, these tools have been designed and implemented patchily without big interference with other tools or an established research agenda. Currently this situation changes dramatically, with several workshops aligned to this new trend, e.g., the Geovisiualization Conference (EURESCO, 2004), the SeBGIS Workshop (OTM, 2005), etc.

5. DEVELOPING STANDARDS AND SPECIFICATIONS Open Geospatial Consortium (OGC) in parallel with World Wide Web Consortium (W3C) and ISO develop standards and specifications to support the interoperability between repositories with geospatial content. Table 1 provides some representative standards/specifications related to information content.

Table 1. Standards and Specifications for geospatial content. Organization Standard/Specification name W3C HyperText Transfer Protocol (HTTP) W3C Simple Object Access Protocol (SOAP) W3C Web Services Description Language (WSDL) OGC Web Map Server (WMS)

7 OGC Web Coverage Server (WCS) OGC Web Feature Service (WFS) OGC Geography Markup Language (GML) OGC Catalog Service (CS) ISO-TC211 Rules for Application Schema (DIS19109) ISO-TC211 Methodology for Feature Cataloguing (DIS19110) ISO-TC211 Spatial Referencing by Geographical Identifiers (CD19112) ISO-TC211 Metadata (IS19115) ISO-TC211 Metadata Implementation Specification (DTS19139)

The scope of these standards and specifications is to enable an application developer to use any geospatial content (data and services) available on “the net” within a single environment and a single workflow (McKee and Buehler, 1998). However, standardization may not solve the problem of interoperability by itself, because of the following reasons (Stoimenov and Dordevic-Kajan, 2002):  The construction and maintenance of a single and integrated model is a hard task.  The requirement to communicate with geospatial sources that do not conform the adopted standard will be always present.  Existing geospatial sources have their own models which may not always be mapped to the common model without information loss.  The standards/specifications are subject to continuous change, but systems will not all simultaneously change to conform.

6. ESTABLISHING THE SYNERGY OF GEOSPATIAL CONTENT SOURCES The problem of interoperability may be overcome by the adoption of one of the well- established methods that allow the synergy of heterogeneous information sources (Garcia Molina et al, 2000) (Figure 3): (a) federated databases; (b) data warehouses; and (c) mediators.

Federated databases adopt the simplest architecture. Information sources are independent to each other. The synergy is accomplished by a separate middleware component (connector) per source pair (Figure 3a). This component is responsible for the

8 interoperability between the two sources. This architecture is ideal when a limited number of sources are involved; otherwise numerous connectors must be built and being maintained. Notice that for N sources, Nx(N-1) connectors are required for a full network.

In data warehouses, the contents of two or more sources are collected and combined into a common schema. Then a single repository, called warehouse, accommodates all information content (Figure 3b). The population of the warehouse is accomplished by an individual content extractor per source and a combiner, which maps the content of each source into the common schema.

Mediators, similarly to warehouses, provide a unified view to the contents of two or more sources. However, a mediator does not accommodate any content. Key component of mediator architecture (Figure 3c) is the middleware, which is called wrapper. A wrapper plays the role of the translator between the mediator and an individual source. Notice that the mediator may be able to judge if a query posed by the user may find useful content to an individual source. This is accomplished through the use of appropriate metadata and meta-services at the mediator level. Mediator technology is more flexible and would be ideal for the development of an SDI (Stoimenov and Derdevic-Kajan, 2002). Obviously, it implementation raises all issues related to semantic heterogeneity as presented in the previous Sections.

query result

Warehouse Source A Source B

Combiner

Extractor Extractor Source C Source D

Source A Source B

(a) (b)

9

query result

Mediator

query query result result

Wrapper Wrapper

query result result query

Source A Source B

(c) Figure 3. Alternative methods to allow the synergy of heterogeneous information sources: (a) federated databases (the arrows represent the connectors); (b) data warehouses (the arrows represent content flow); and (c) mediators (the arrows represent content flow).

7. DISCUSSION The paper provides an overview of the role of semantics in a spatial data infrastructure or service-driven infrastructure (SDI), and the alternative architectures of the middleware between the clients (users and applications) and the servers (data and services).

It is widely recognized, that the integration of semantics (of both data and services) at all components of an SDI middleware (metadata, gateways and interfaces) may promote the SDI functionality and usefulness. Currently, technology provides enhanced tools and methods for implementing this task (models, repositories, s/w tools); however GI science is not mature to make full use of them, provided that the issues of semantic heterogeneity must be overcome.

REFERENCES Adrienko, G., and Adrienko, N., 1999. Interactive maps for visual data exploration. International Journal of Geographical Information Science, vol. 13 (4), pp.355-374. Taylor-Francis. Bernard, L., Craglia, M., Gould, M., and Kuhn, W., 2005. Towards an SDI research agenta. In the Proceedings of the 11th EC-GI & GIS Workshop – ESDI: Setting the Framework. Sardinia, Italy. Bernard, L., Georgiadou, Y., and Wytzisk, A., (Eds) 2005. Position Papers. First Research Workshop on Cross-learning on Spatial Data Infrastructures and Information Infrastructures. ITC, The Netherlands. Bernard, L., Einspanier, U., Haubrock, S., Hübner, S., Kuhn, W., Lessing, R., Lutz, M. and Visser, U., 2003. Ontologies for Intelligent Search and Semantic Translation in

10 Spatial Data Infrastructures. Photogrammetrie - Fernerkundung - Geoinformation. Ausgabe 6. Seite(n) 451-462. http://ifgi.uni- muenster.de/~lutzm/pfg03_bernard_et_al.pdf Bishr, Y., 1998. Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science. Special Issue on Interoperability in GIS. Vol. 12, No 4. pp. 299-314, Taylor & Francis. EAA, 2000. CORINE Land Cover: Technical Guide. European Environmental Agency. EURESCO, 2004. Geovisualisation: EuroConference on Methods to Support Interaction in Geovisualisation Environments. Kolymbari, Crete, Greece. ESF. Evans, J.D., 2003. Geospatial Interoperability Reference Model. FGDC, WG GAI. http://gai.fgdc.gov/girm/v1.0/ FGDC, 1998. Content Standard for Digital Geospatial Metadata. Version 2.0. Federal Geographic Data Committee. Fonseca, F., and Egenhofer, M., 1999. Ontology-driven information systems. In the Proceedings of the 7th ACM Symposium on Advances in GIS. Kansas, MO, pp. 14-19. ACM Press. Fonseca, F., and Sheth, A., 2003. The GeoSpatial Semantic Web for UCGIS research priorities. UCGIS. http://www.ucgis.org/ priorities/research/2002researchPDF/shortterm/e_geosemantic_web.pdf Guarino, N., 1998. Formal ontology in information systems. In the Proceedings of the Formal Ontology in Information Systems (FOIS). Trento, Italy, pp. 3-15. IOS Press. INSPIRE, 2004. The INSPIRE Proposal. The INfrastructure for SPatial InfoRmation in Europe. http://www.ec-gis.org/inspire/ ISO-TC211, 2003. Text for FDIS 19115 Geographic Information – Metadata. Final Draft Version. International Organization for Standardization. http://www.isotc211.org/ ISPRS, 2006. Workshop on Multiple Representation and Interoperability of Spatial Data, Hanover, Germany, February 22-24, 2006. ISPRS. http://www.ikg.uni- hannover.de/isprs/ workshop.htm Kraak, M.J., and Brown, A., 2000. Web Cartography. CRC Press. McKee, L., and Buehler, R., 1998. The Open GIS Guide. Open GIS Consortium, Inc. http://www.OpenGIS.org/techno/ specs.htm Garcia Molina, H., Ullman, J.D., and Widom, J., 2000. Database Systems Implementation. Prentice Hall. Nebert, D.D., (Ed.) 2004. Developing Spatial Data Infrastructures: The SDI Cookbook. Version 2. GSDI Publications. http://www.gsdi.org/docs2004/Cookbook/cook- bookV2.0.pdf Nogueras-Iso, J., Zarazaga-Soria, F., and Muro-Medrano, P.R., 2005. Geographic Information Metadata for Spatial Data Infrastructures: Resources, Interoperability and Information Retrieval. Springer. OTM, 2005. 1st International Workshop on Semantic-based GIS. Agia Napa, Cyprus, November 3-4, 2005. http://www.cs.rmit.edu.au/fedconf/sebgis2005cfp.html Stoimenov, L., and Derdevic-Kajan, S., 2002. Framework for semantic GIS interoperability. Facta Universitatis. Ser. Math. Inform. 17, pp. 107-125. Wache, H., Voegele, T., Visser, U., Stuckensmhmidt, H., Scuster, G., Neumann, H., and Huebner, S., 2001. Ontology-based integration of information, a survey of existing approaches. Workshop on Ontologies and Information Sharing, IJCAI’01. pp. 108- 117. Williamson, I.P., Rajabifard, A., and Feeney, M.E.F, (Eds) 2004. Developing Spatial Data Infrastructures: From Concept to Reality. Taylor & Francis Group.

11

Recommended publications