Geographic Feature Pipes

Marcell Roth

Institute for Geoinformatics, University of Muenster, Germany [email protected]

Abstract. Aggregating and combining data coming from different Web sources to create ad-hoc information refers to the concept of “piping” data. Linked Data is a solution which facilitates the browsing through related information and provides technologies to easily pipe data included in this Web of Data. The Open Geospatial Consortium (OGC) has estab- lished standards for the storage, retrieval, and processing of geospatial data. These standards act as foundation for the Spatial Data Infrastruc- tures. The integration of existing geospatial data into the Data Web is missing yet. The presented Geographic Feature Pipes (GFP) is an API deployed as free Web service working towards closing this gap. It trans- lates sensor data based on the OGC’s Observations and Measurements specification as well as geospatial data served by OGC Web Feature Ser- vices into its RDF representations. This enables complex queries and browsing through related geospatial data sources, as well as means of merging information of geographic features with related sensor data into one document. The translated data based on ontologies providing the vo- cabulary for the definition of the data entities. The presented approach shows that in conjunction with semantic annotations, we are able to bridge the gap between geospatial applications and Semantic Web tech- nologies to move toward the development of the Geospatial Semantic Web.

1 Introduction

The Web is based on URLs as unique identifiers for documents and other data. These links allow users for browsing through the Web in order to retrieve infor- mation. Despite the advantages the Web offers, published data (information) is primary nested in HTML Web pages. HTML is about layouting content and not able to type links connecting an entity of the Web document to related entities [3]. Hyperlinks indicate that two documents are related, but leave it to the user to infer the nature of the relationship. Linked Data is a solution to create shared and structured information spaces [3], which include links between related infor- mation stating the nature of the connection. Its purpose is to create and connect related data on the Web with typed links, as if it would be one global database. To realize such a Web of Data, published data has to follow the Linked Data principles first outlined by Berners-Lee in 2006 [2]: the “raw” data is encoded in the machine-readable RDF [17], the data is Web addressable via URIs, and data is linked with other data via RDF links. RDF is a graph-based data model representing information with subject-predicate-object expressions (also called triples). A RDF link is one type of RDF triple and states that one data entity has some kind of relation to another data entity [4]. Linked Data promotes the reuse of information and reduces redundancy of existing information. It facil- itates the discovery of relevant information within the variety of information resources. Instead of following hyperlinks, users follow RDF links. The SPARQL Protocol and RDF Query Language (SPARQL) [25] supports users in formu- lating more sophisticated queries. Information of relationships stated in various RDF documents can be retrieved by querying across different sources. SPARQL also provides capabilities to easily combine information from different sources by merging two sets of triples into a single RDF graph [12]. Thus, new information can be created from the resulting dataset. Location is ubiquitous [11] and is an issue in many of the problems deci- sion makers must solve [16]. Such problems may vary from simple questions like “Where are my friends now?” to complex ones, for example, which areas are prone to floods in order to reduce the potential damage. Such geographic problems il- lustrate the increasing interest in geographic information (GI) in recent years. Geobrowsers like GoogleEarth1 or Microsoft’s Bing Maps2 are responses to user needs for location-based information services. They are part of the Geospatial Web [26], which makes GI shareable, searchable and ubiquitous for users and decision makers [8] by using the infrastructure of the Web. In the Geospatial Web, a distinction is made between geospatial data and services that facilitate the use of GI in many domain applications [14]. The variety of datasets con- taining GI reaches from simple map images to complex vector or sensor data. The Open Geospatial Consortium3 (OGC) developed the XML-based Geogra- phy Markup Language (GML) [24] as data modeling and encoding standard for GI, in particular when modeled as features, following the ISO/OGC reference model [23]. Vector data is conceived of as a feature, which is an abstraction of a real world phenomenon. Associated with a geographic location relative to the Earth, it is labeled as a geographic feature. Examples include buildings, streets, and rivers. Sensor data is stored and published using OGC’s Observations and Measurements (O&M) [6] model. Much of this data has been made available as Web services in the last decades. Web services are an important component in the fabric of the Geospatial Web [14], since they enable the sharing of geospatial data across organization boundaries over the Web [29]. Furthermore, they act on data and support discovery, retrieval and processing functionality. The OGC specifies implementation standards for such geospatial Web services. They are divided into various types: Web Feature Services (WFS) [28] serve vector-based data. A Sensor Observation Service (SOS) [22] provides a Web service interface to access observation results measured by sensors and sensor systems. Web Pro- cessing Services process or analyze geospatial data, e.g. the complex calculation of roadway noise. Various other OGC Web Services (OWS) exist and are listed

1 See http://earth.google.com/ 2 See http://www.bing.com/maps/ 3 See http://www.opengeospatial.org/ on the OGC Web site4. These services can be combined to Spatial Data Infras- tructures (SDI) to improve the interoperability between various data providers and users by smoothly exchanging and integrating GI. Despite the benefits the Geospatial Web provides, several open issues have to be discussed. GI is not well integrated in the Geospatial Web yet. It is possible to request GI from an OWS via an unique URL, e.g. a feature collection served by a WFS, but features (data entities) included in this dataset cannot be deref- erenced by people or clients. Consequently, links to information that is related to such a feature do not exist as well, although it would facilitate the Geographic Information Retrieval (GIR) [15]. Different OGC standards also raise compat- ibility issues across different applications. Merging different datasets, such as O&M and GML, into one dataset is very difficult. The transformation and publication of the OpenStreetMap [1] and Ordnance Survey [10] data according to the Linked Data principles have added a new dimension to the Web of Data. This work also adds spatial and temporal di- mensions to the Web of Data. With its benefits the work solves the mentioned issues of the Geospatial Web. In this paper we present our implementation of the Geographic Feature Pipes (GFP) which translate O&M based observations and GML features into RDF. This provides options to discover spatiotemporal data and possible related information by following RDF links or even to merge information related to a geographic feature into one document. GFP is a proxy- based solution [13] and a first step to bridge the gap between geospatial data included in the Geospatial Web and the Linked Data community. It increases the accessibility to non-OGC data sources. Features might be linked to Geon- ames5 entries. DBpedia6 entries might be connected with real-time sensor data. Providing features and observations as Linked Data make them accessible for a broad audience, which is maybe not aware of the geospatial Web services defined by the OGC. Linking them to entries included in the LinkedGeoData dataset [1] bridges the gap between the emerging Volunteered Geographic Information (bottom-up) data formats [9] and top-down standards like GML or O&M as well. Our approach adds extra knowledge to the datasets by using RDF-Schema (RDF-S) [5] ontologies for the definition of the geospatial linked data entities. Ontologies provide domain-specific terms for describing types of things in the real world and relations among those. Even more powerful queries can be for- mulated as a result. Linking information of the geographic domain to the Web of Data bridges the gap to the Semantic Web community as well. The remainder of this paper is structured as follows. A brief application sce- nario is introduced in Section 2 which illustrates the benefit of creating geospatial linked data entities. The implementation of GFP is described in Section 3, before we summarize and outline future work in Section 4.

4 See http://www.opengeospatial.org/standards 5 See http://www.geonames.org/ 6 See http://dbpedia.org/ 2 Application Scenario

Creating GI following the Linked Data principles has several advantages. Geospa- tial linked data entities can easily be linked to related information and provide capabilities of merging two datasets to infer more information. It supports users to construct sophisticated queries and if the data is semantically annotated, the queries are even more powerful. The semantic enrichment of the underlying data models by linking them to formally specified vocabularies such as ontolo- gies is called semantic annotation [21,18]. Semantic query processing performed by reasoning engines like IRIS7 with semantic annotations return more precise discovery results. In the following we present an application scenario which il- lustrates the benefit of O&M based observations and GML features provided as linked data. Here we assume that the underlying data models, which are defined as RDF-S ontologies, are semantically annotated. The data model additionally includes a domain reference linking to global domain concepts capturing the data’s relation to reality. Features are linked to Geonames entries as well. Bob works for the Federal Institute of Hydrology in Germany. Due to a longer dry season in North Rhine-Westphalia, he has to determine the navigability of rivers in this state for vessels with a minimum width of 10 meters. To meet his task, he requires information about the average width of the rivers and as well about their continuously changing qualities such as water levels and flow rate. The latter data is, for example, coming from sensors offering real-time observations served by a SOS. Instead of finding and requesting each OGC Web service offering him the needed information, he navigates to our website which provides him a SPARQL endpoint to specify his query. This is connected to a database including RDF-encoded observation data and river features served by a WFS. First, he enters a query to find all rivers within his specified area which are wide enough for his previously defined vessels. Bob retrieves URLs pointing to RDF documents describing his selected features. Bob would like to obtain the rivers which are deep enough to allow vessels with a loaded draft depth of at least 4 meters to navigate. Hence, he searches for rivers with a current water level below this threshold. The flow rate should be less than one meter per second as well. Bob rephrases the query at our web site to filter the rivers complying with his needs. After executing the query he gets URLs of RDF datasets containing information only about rivers which are not navigable under his specified conditions. These rivers will be closed for such vessels. These merged datasets are composed of vector-based river data as well as river water level and flow rate data served by a SOS, both previously translated into RDF.

7 See http://www.iris-reasoner.org/ 3 Geographic Feature Pipes - Implementation

GFP is based on a Java API translating the O&M and GML based GI into RDF descriptions. The two prominent packages of this library are depicted in Figure 1 as blue boxes and shows that the API is based on Sesame8 (pink and yellow components), since each component depends on the components that are beneath them. Sesame is an open source Java framework for storing and querying RDF data including RDF-S inferencers, query result formats and query languages such as SPARQL, various RDF storage backends, and various RDF file formats. Further information about each Sesame component is available in chapter 3 of its user guide9. The primary interesting components GFP is based on are package GFP.Translator and package GFP.Pipes. The latter provides means to retrieve the translated and probably merged RDF documents by executing SPARQL queries. The former is responsible for creating geospatial linked data.

GFP.Translator GFP.Pipes HTTP Server

Repository API HTTP Repository

RDF I/O HTTP Client

RDF Model

Fig. 1. Overview of components the GFP is based on.

The translation procedure is illustrated in Figure 2 and requires a Procedure- Oriented Service Model (POSM)10 for a WFS or SOS including a reference to the corresponding data model the created data entities are based on. These data model ontologies either represent the O&M data model or the OGC Feature Model and might be semantically annotated by linking to domain ontologies capturing their meaning in the real world. The POSM describes a Web service in RDF and is used in the European research project ENVISION11 to semanti- cally annotate environmental models [20]. ENVISION provides a Service Model Translator (SMT) implemented as Java API which creates such service models for OGC-compliant Web services like WFS, SOS or WPS. The SMT creates a respective POSM for each FeatureType or observedProperty. Its libraries can 8 See http://www.openrdf.org/ 9 See http://www.openrdf.org/doc/sesame2/users/ 10 See http://www.wsmo.org/ns/posm/0.1/ 11 See http://www.envision-project.eu/ Fig. 2. The translation procedure.

be downloaded from the “ENVISION Portal” source code repository of ENVI- SION’s open source project12 and directly integrated into Java applications if Maven13 is used for the project’s build process. Further descriptions of this API, the POSM, the data model ontologies, and the annotation procedure are given in Deliverable 4.214 of this research project. In step (1), a user registers a POSM for either a WFS or SOS implementation. The Web service generates a con- text identifier (contextID) representing the IRI indicating the translated RDF dataset, which thus can be obtained with SPARQL queries using the FROM clause after the translation. A context, supported by Sesame as well, is also used to identify which dataset has to be updated in the repository, and then to figure out which statements eventually have to be removed or replaced. The Web ser- vice name, coupled with the feature type, forms the contextID used to retrieve a RDF-encoded feature collection served by the WFS. The first URL in Figure 3 represents an example contextID. A contextID required for obtaining linked sensor data is represented by the second URL. It consists of the Web service name, the observedProperty identifier, and the featureOfInterest. GFP reads the URL of the service and the data type, which are stored in the POSM, and opens a connection to the underlying Web service for retrieving the corresponding data entities (2). If those are observations, the GetCapabilites document of the SOS is parsed before to get all featureOfInterest identifiers which are related to the observedProperty parameter. For each pair of both parameters, an observation collection is requested afterwards. In step (3), the obtained data is read, then translated into RDF and finally added to a Sesame repository (either stored local or on a Sesame server) associated with the contextID. Open source prod-

12 See http://kenai.com/projects/envision/pages/Home 13 See http://maven.apache.org/ 14 See http://www.envision-project.eu/wp-content/uploads/2011/03/D4.2-1.0.pdf Fig. 3. ContextIDs used to query the RDF datasets.

ucts are used for parsing the data. While features are read with a GML parser provided by Geotools15, observations are accessed, queried and parsed with the OX-Framework16. The contextIDs are sent back to the user which registered the POSM and are as well mapped to SPARQL queries used to retrieve the translated datasets. These queries are then stored as one RDF statement in the Sesame repository associated with a query identifier (queryID). A queryID consists of the contextID and an additional query parameter. It is required to retrieve the SPARQL query and the request parameters which are also stored as RDF statements. The latter are needed for retrieving the original geospatial data if they are out-of-date. Our approach assumes that geospatial data, particular dynamic sensor data, has to be updated regularly after last creation time that is added as RDF statement in the translated dataset. Finally, a user can get the translated data via the contextID. The service resolves the given id to the queryID, retrieves the associated SPARQL query stored in the repository, and executes it to get the translated data. If the data is up-to-date, the stream is di- rectly forwarded to the user. Otherwise, the original data will be requested with the previously queried request parameters, then translated into RDF, stored in the Sesame repository and finally send back to the user. The GFP.Translator API makes use of existing vocabularies for defining the geospatial linked data entities. While observations are based on an O&M ontology17 which in turn is aligned to the Semantic Sensor Network (SSN) on- tology18, GML features are defined by an OGC Feature Ontology19 based on the GML profile 2.0 [27]. Feature types are modeled as subclass of an AbstractFeature class which can have a geometry and other properties. Features are then serialized as instances of the feature type and inherit the properties. Each geometry is an instance of an AbstractGeometry class. They represent the geographic location and shape of a feature often defined by a set of geographic coordinates. Since our approach is focused on merging feature prop- erties instead of finding a common solution for the representation of geometries, we represent them as Well-known text (WKT) strings. WKT reduces the amount of triples in the dataset yielding a faster RDF translation and more efficient RDF document merging. In accordance with the O&M schema, observations are also modeled as sub- classes of AbstractFeature. That enables the merging of observations and fea-

15 See http://www.geotools.org/ 16 See http://52north.org/communities/sensorweb/oxf/index.html 17 See http://purl.org/ifgi/om# 18 See http://purl.oclc.org/NET/ssnx/ssn 19 See http://purl.org/ifgi/gml/0.2# Fig. 4. A SPARQL query to merge RDF-encoded GML features with related O&M based observation data. tures with SPARQL, because observations as well as features are RDF instances of the same class. A sample query merging geospatial linked data describing river features with related linked sensor data (observedProperty) is shown in Figure 4. Ensuring that observations belong to the translated GML feature is done by a spatial match on the geometries of the feature and the feature of interest, and through an additional logic-based reasoning on the semantically annotated data models. However, we expect that this process has been done during a previous design phase. Users are able to register such queries to the Sesame repository as well. Once registered, the merged data can be retrieved using a contextID that is defined by a combination of the merged dataset identifiers. The process to retrieve the data is similar to the previously introduced one which ensures that the user gets up-to-date information. 4 Conclusion

The vision of the Data Web assumes that information is published by following the Linked Data principles. Providing geospatial data as Linked Data act as bridge between that Web and the existing Geospatial Web. In conjunction with semantic annotations, they are a move towards the Geospatial Semantic Web introduced by Egenhofer [7]. It is a solution to bring the benefits of semantics coupled with RDF data to existing geospatial infrastructures. In this paper, we discussed the problems of the Geospatial Web and intro- duced a first solution for solving the issues with the benefits of the Data Web. The presented approach targets RDF-based geospatial data served by a SOS or a WFS. Describing satellite images, maps or even raster-based data such as digital terrain models was not covered, but is also required to move the Geospa- tial Web towards the Geospatial Semantic Web. The introduced Web service supports the creation of linked sensor data and linked GML-based features by using ontologies and open source products as common base. We use existing ontologies for the definition of the linked data entities as it ensures that the data can be smoothly integrated by data consumers. Furthermore, they provide the opportunity of merging sensor data with feature properties, since both, fea- tures and observations, are described by the same concept. The approach makes it possible to users from other communities to gain access to geographic infor- mation as well. Since the geospatial data, particular the sensor data, changes continuously, the solution also addresses the challenge of keeping the data up- to-date. Geospatial linked data, semantically annotated by including references to domain ontologies, help to bridge the vocabulary gap between different do- mains [19] and support a more efficient GIR. Querying the translated data with SPARQL offers a sophisticated way to explore and aggregate information. These benefits have been illustrated with a scenario. The approach also allows for users a SPARQL query registry for storing individual queries representing their needs. Future work will target the development of a Web interface which offers the registration of a POSM and SPARQL queries. The latter may be used to merge features with related sensor data without of a geographic information system. We will also work on an interface allowing users for retrieving the geospatial linked data via the registered contextID.

References

1. S. Auer, J. Lehmann, and S. Hellmann. LinkedGeoData: Adding a spatial dimen- sion to the Web of Data. The Semantic Web-ISWC 2009, pages 731–746, 2009. 2. T. Berners-Lee. Linked Data, 2009. Personal view available from http://www.w3.org/DesignIssue/LinkedData.html. 3. C. Bizer, T. Heath, and T. Berners-Lee. Linked Data-The Story So Far. Interna- tional Journal on Semantic Web and Information Systems (IJSWIS), 5(3):1–22, 2009. 4. C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee. Linked Data on the Web (LDOW2008). In Proceeding of the 17th international conference on World Wide Web, pages 1265–1266, New York, USA, 2008. ACM. 5. D. Brickley and R. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. W3c recommendation, World Wide Web Consortium (W3C), 2004. Re- trieved from http://www.w3.org/TR/rdf-schema/. 6. S. Cox. OGC Implementation Specification 07-022r1: OpenGIS Observations and Measurements - Part 1: Observation schema. Technical report, Open Geospatial Consortium Inc., 2007. 7. M. J. Egenhofer. Toward the Semantic Geospatial Web. In Proceedings of the 10th ACM international symposium on Advances in geographic information systems, GIS ’02, pages 1–4. ACM, 2002. 8. M. Gerlek and M. Fleagle. Imaging on the Geospatial Web Using JPEG 2000. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 27–38. Springer Verlag, 2007. 9. M. Goodchild. Citizens as sensors: the world of volunteered geography. GeoJournal, 69(4):211–221, 2007. 10. J. Goodwin, C. Dolbear, and G. Hart. Geographical Linked Data: The Adminis- trative Geography of Great Britain on the Semantic Web. Transactions in GIS, 12:19–30, 2008. 11. G. Hart and C. Dolbear. What’s So Special about Spatial? In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 39–44. Springer Verlag, 2007. 12. T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1(1):1–136, 2011. 13. J. Kahan, M. Koivunen, E. Prud’Hommeaux, and R. Swick. Annotea: An open rdf infrastructure for shared web annotations. In WWW ’01: Proceedings of the 10th international conference on World Wide Web, pages 623–632. ACM Press, 2001. 14. R. Lake and J. Farley. Infrastructure for the Geospatial Web. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, pages 15–26. Springer Verlag, 2007. 15. R. Larson. Geographic Information Retrieval and Spatial Browsing. GIS and Libraries: Patrons, Maps and Spatial Information, pages 81–124, 1996. 16. P. A. Longley, M. F. Goodchild, D. J. Maguire, and D. W. Rhind. Geographic Information Systems and Science. John Wiley & Sons, 2005. 17. F. Manola and E. Miller. RDF Primer. W3c recommendation, World Wide Web Consortium (W3C), 2004. Retrieved from http://www.w3.org/TR/rdf-primer/. 18. P. Maué, H. Michels, and M. Roth. Injecting semantic annotations into (geospatial) Web service descriptions. Semantic Web - Interoperability, Usability, Applicability, 1, 2010. 19. P. Maué and J. Ortmann. Getting across information communities. Earth Science Informatics, 2:217–233, 2009. 20. P. Maué and D. Roman. The ENVISION Environmental Portal and Services Infras- tructure. In Proceedings of International Symposium on Environmental Software Systems (ISESS), 2011. Not yet published. 21. P. Maué, S. Schade, and P. Duchesne. Semantic Annotations in OGC Standards. Open Geospatial Consortium (OGC), 2008. 22. A. Na and M. Priest. OGC Implementation Specification 06-009r6: OpenGIS Sen- sor Observation Service (SOS). Technical report, Open Geospatial Consortium Inc., 2007. 23. OGC. OGC Reference Model (ORM) - Version 2.0, 2008. 24. C. Portele. OGC Implementation Specification 07-036: OpenGIS Geography Markup Language (GML) Encoding Standard. Technical report, Open Geospa- tial Consortium Inc., 2007. 25. E. Prud’Hommeaux and A. Seaborne. SPARQL Query Language for RDF. W3C Recommendation, 2004. Retrieved from http://www.w3.org/TR/rdf-sparql- query/. 26. A. Scharl. Towards the Geospatial Web : Media Platforms for Managing Geotagged Knowledge Repositories. In A. Scharl and K. Tochtermann, editors, The Geospatial Web, volume 2, pages 3–14. Springer Verlag, 2007. 27. L. van den Brink, C. Portele, and P. A. Vretanos. OpenGIS Implementation Stan- dard Profile 10-100r2: Geography Markup Language (GML) simple features profile. Technical report, Open Geospatial Consortium Inc., 2010. 28. P. P. A. Vretanos. OGC Implementation Specification 09-025r1 : OpenGIS (WFS). Technical report, Open Geospatial Consortium Inc., 2010. 29. P. Zhao, G. Yu, and L. Di. Geospatial Web Services. In B. N. Hilton, editor, Emerging Spatial Information Systems and Applications, pages 1–35. Idea Group Publishing, 2007.