Undefined 0 (0) 1 1 IOS Press

Injecting semantic annotations into (geospatial) Web service descriptions

Editor(s): Thomas Lukasiewicz, University of Oxford, UK Solicited review(s): Jacek Kopecký, The Open University, UK; Tudor Groza, The University of Queensland, Australia; Marinos Kavouras, National Technical University of Athens, Greece

Patrick Maué a Henry Michels a and Marcell Roth a a Institute for Geoinformatics (ifgi) University of Münster, Germany Weseler Str. 253 48151 Münster Germany e-mail: fi[email protected]

Abstract. Geospatial Web services comply with well- 1. Introduction established standards to support seamless integration into applications ranging from commercial Geographic Efficient discovery relies on search engines with Information Systems (GIS) to open source web map- sophisticated indexing and scoring techniques. ping clients. Descriptions of service capabilities con- These techniques can not be simply applied on tain information about the provided data. Updates to structured data without textual content such as the underlying result in changing descrip- tions. To ensure compatibility with existing solutions, geospatial data, or any kind of binary data, in- semantic enablement of Geospatial Web services has cluding pictures or videos. Finding such content to reflect both, standards and changing . Se- relies on annotations which associate descriptive mantic annotations link between legacy non-semantic metadata with the resource [1]. The metadata is Web service descriptions and their semantic counter- then again used for common indexing and scor- parts. The open source Semantic Annotations Proxy ing techniques. Semantic annotations link to for- (SAPR) is a light-weight RESTful API deployed as mally specified vocabularies capturing the con- free service which “injects” semantic annotations into tent’s meaning. Besides the traditional informa- existing Web service descriptions without breaking the tion retrieval techniques based on indexing and standards. This approach decouples the annotations from the original metadata, which ensures the separa- string matching, semantic annotations linking to tion of concerns between data providers and end users ontologies support logic-based reasoning. In the with different and sometimes conflicting views on an- case of finding information, reasoning engines such notations. In addition, the service is robust regarding as Pellet [27] can then precisely match the seman- changes of the service descriptions. The presented ap- tic queries to the semantically annotated content, proach is focusing on W3C- and OGC-compliant Web and ensure better precision of search results [15]. services, but can be theoretically applied on any kind We understand semantic annotation as a ref- of information source with structured metadata. erence which establishes a Link Annotation [1] Keywords: Semantic Annotation, , between the application-specific metadata and a Tools, Geospatial Semantic Web, Semantic Web Ser- shared external vocabulary. They enable reasoning vices on the linked ontologies and support non-domain experts to better understand what the data rep- resents. In this paper we focus on annotations for Web services which describe their interfaces with standardized metadata. The latter is semantically

0000-0000/0-1900/$00.00 c 0 – IOS Press and the authors. All rights reserved 2 Injecting Semantic Annotations annotated to integrate the Web services into se- scriptions ensures a clear Separation of Concern mantically enabled applications. (SoC). The proposed solution for loosely-coupled Research on semantic annotations for Web doc- metadata (separating between client-specific an- uments [16,11], multimedia [29,3], geospatial data notations and the provider-specific metadata) sup- [15,12,19], and Web services [30,28,17] has been ports multiple annotations for one service descrip- around for years. The description of Semantic Web tion. This makes different application scenarios Services (SWS) eventually led to the standard “Se- feasible, without relying on the service provider for mantic Annotations for Web Service Description specifying the annotations. The introduced imple- Language (SAWSDL)” [14]. It specifies extension mentation reflecting the standards and changing points for W3C-compliant Web service metadata metadata, as well as the discussion of SoC for Web encoded with the Web Service Description Lan- service annotations should be considered as main guage (WSDL). Although the W3C standards are contribution of this paper. widespread and well-established, domain-specific The following section 2 introduces two applica- standardization organizations have come up with tion scenarios which illustrate the benefit of inject- their own solutions. Geospatial Data Services, for ing annotations. A more in depth discussion about example, comply to the standards of the Open the need for injecting references is following in sec- Geospatial Consortium (OGC)1. These standards tion 3. The implementation of SAPR is introduced define their own approach for XML-based meta- in section 4, followed by an evaluation in section 5. data. Depending on the type of the spatial resource Section 6 lists related work about Semantic Web (i.e. features for vector-based data, coverages for Services. The paper concludes with an outlook and raster based data, and geoprocessing methods), a summary in section 7. different OGC Web service types exist. In this paper we present the implementation of the Semantic Annotations Proxy (SAPR)2. It 2. Application Scenarios builds on (and is part of) the Sapience API3, which comprises libraries used to extract, store, Annotations support better interpretation and and inject semantic annotations within Web ser- evaluation of the referenced data. Using a proxy ensures that different sets of annotations for appli- vice metadata without breaking the underlying cations deployed in different scenarios can be re- standards. SAPR exposes the Sapience function- quested. In the following two sections we illustrate ality following a Software-as-a-Service approach. one typical application scenario for semantic an- It is a conceptually simple proxy-based [11] solu- notations (linking to vocabularies describing the tion for injecting semantic annotations. Similar to data’s relation to reality) and one for data qual- traditional HTTP proxies (e.g. for Web browsing), ity annotations (linking to vocabularies describing SAPR acts on behalf of the proxied Web service aspects such as trust or uncertainty). and either redirects client requests to the origi- nal Web service or updates the returned meta- 2.1. Annotations for capturing data data. End users are not aware of its existence: the proxy takes the identity of the proxied service. The most prevalent application of annotations The “injection”-procedure refers to the semantic is semantic enrichment: the individual entities in enrichment of XML-based documents by writing a data model are linked to concepts in shared Link Annotations directly into the data stream. vocabularies to explain what the data represents We explain how the proxy-based solution for the in reality. Semantic annotations combined with semantic enablement of existing Web service de- semantic-enabled applications can be useful for a varying field of applications such as Web service 1The reason why OGC has not (yet) adapted W3C stan- discovery, automatic integration into existing busi- dards is simple: the OGC standards have been developed ness or scientific workflows, logic data integrity years before SOAP/WSDL emerged. tests, semantic validation of Web service composi- 2SAPR is a free service, available at: http:// tions, or inferring data visualization strategies. A semantic-proxy.appspot.com 3See http://purl.org/net/sapience/docs for the ac- more in depth discussion of these applications can cess to the source code, issue tracking, and documentation. be found in [20]. Injecting Semantic Annotations 3

The linked vocabularies can be commonly ac- tially avoid) semantic conflicts in geospatial work- cepted thesauri of one particular domain, shared flows. domain ontologies developed for particular use Semantic annotations linking to local applica- cases, or local application-specific ontologies. The tion ontologies have been proven useful to cap- domain ontologies should be understood as for- ture the sometimes complex functional dependen- mal specifications of reality, in particular of our cies within data models [19]. In the given exam- geographic environment. A Web Feature Service ple, CODE identifies the geological formation (i.e., (WFS) [31] could, for example, provide XML all features representing geological layers formed schema describing the data model of the fea- in the pleistocene era have the code “2”). Built-in ture type GEOL50KType with the two attributes ontology constructs such as constraints on proper- CODE and FORMATION (see Figure 1). These at- ties in OWL help to re-model this inner relation- tribute labels are unfortunately cryptic, its mean- ships of the data. To make these local ontologies ing remains hidden to non-experts (or experts useful, they have to be aligned to globally shared from different information communities). Request- ontologies. The FORMATION could then, for exam- ing the actual data returns “2” for CODE and “Plio- ple, be linked to a domain concept GeologicEra. Pléistocène” for FORMATION, which does not re- 4 veal any meaning as well. This example is taken 2.2. Annotations for describing data quality from one of the scenarios of the European research project ENVISION. It adequately reflects the cur- Communicating quality aspects of data is cru- rent situation: today’s use of (geospatial) Web ser- cial for evaluating its usefulness for the intended vices is significantly impaired by the lack of mean- application. Typical data quality metadata cov- ingful metadata [23]. ered in the following are data provenance and un- GIS users directly load and visualize feature certainty. data from such a WFS. In this sense, the data Geospatial data is the result of a measurement is useable, but far from being useful. Client ap- procedure (or sensor observation), and each mea- plications can be semantically enabled (i.e. sup- surement comes with some sort of error. The origi- porting logic inference and visualization of the nal input for the data representing geologic layers, referenced knowledge). But without semantic an- for example, derives from core samples. A three- notations, these clients have no means to iden- dimensional interpolation algorithm estimates the tify and retrieve the shared vocabularies to infer distribution of the layers according to these sam- what these attributes represent. And without a precise knowledge about the underlying data mod- ples. This deviation from truth is commonly called els, the data itself can be hardly used for criti- the Uncertainty of geographic information. Appli- cal geospatial tasks like decision making or risk cations in need for high accuracy of certain prop- modelling. In Figure 1, the attributes have an ad- erties, e.g. urban planning, should be aware of im- ditional attribute sawsdl:modelReference="..". precise data. Otherwise, the error is hidden in the This model reference is the link annotation which result (and high precision is simulated), leading is not part of the original schema, but has been eventually to wrong calculations or more serious injected afterwards. Semantic-enabled clients fol- issues. UncertML [32] has been developed as ex- low these links pointing to classes in RDF-based tension for geospatial data models to formalize the ontologies. On this level, descriptive metadata and uncertainty. The Link Annotations then point to meaningful (and multilingual) labels exist which the appropriate definitions in the UncertML dic- 5 are needed for the interpretation of the data mod- tionary . Such information might not be neces- els. More importantly, reasoning with common en- sary for some applications (i.e. simple navigation gines such as Pellet for OWL-DL[27] ease integra- tasks). Their clients won’t be able to process the tion. They help to select appropriate visualization updated uncertainty information. With the help strategies (e.g. features representing water bodies of the proxy, this information about the uncer- are commonly coloured blue) or detect (and poten- tainty can simply injected during runtime, and

4The service can be found at: http://envision. 5Examples and the dictionary are available at: http:// brgm-rec.fr/Data_Geology.aspx dictionary.uncertml.org/ 4 Injecting Semantic Annotations

Fig. 1. Semantic Annotations for XML (GML) schema only (technically) compatible clients can request intensive task. That service providers will update this data if needed. and re-deploy existing Web services to include as- Specifying provenance of (geospatial) data helps pects such as semantic annotations cannot be ex- to build confidence (or trust) into the data. Such pected. The proxy-based solution of SAPR has ini- metadata includes, amongst others, detailed infor- tially been a pragmatic approach to integrate these mation about the publishing organization and a legacy Web services into semantic-enabled appli- description of the data lineage. The former could cations. SAPR enables client application develop- be as simple as a link to a publicly shared FOAF ers in need for semantically enriched Web services profile. Clients might want to learn about the to simply annotate Web service in question and data provider’s reputation to infer its usefulness register it to the proxy. The original Web service is for critical applications. Data lineage refers to the not modified and the outcome remains standards- actual process for creating the data. Here, Link compliant. The only modification from a client’s Annotations point to external (but application- perspective is the change of the URL representing specific) documents containing detailed informa- the Web service location. tion about the creation process. Which links to Selecting an appropriate strategy to benefit either FOAF profiles or Data Provenance doc- from annotations requires that applications under- uments are required has to be decided by the stand the annotation’s intended purpose. Bech- client application. The presented approach sepa- hofer et al. [1] state that “we must be explicit rates between original provider-specific metadata about the assumptions that we make and the and application-specific annotations, making it context within such annotations should be in- possible to support semantically-enabled applica- terpreted”. They distinguish between Decoration, tions and uncertainty-aware applications simulta- neously. Linking, Instance Identification and Instance Ref- erence, Aboutness and Pertinence as typical an- notation types. Marshall [16] draws a line be- 3. Separation of Concerns for Semantic tween formal/informal and explicit/tacit anno- Annotations tations. The presented approach is restricted to Link Annotations pointing to formal (and explicit) Legal directives such as INSPIRE (Infrastruc- metadata. ture for Spatial Information in Europe)and the The limitation on Link Annotations for the in- U.S. NSDI (National Spatial Data Infrastructure) jection is based on the two following assumptions: call for a Web-based provision of spatial data from (1) Link Annotations support loose coupling of the public sector. In the last decade, large-scale the data models and the domain-specific meta- spatial data infrastructures (SDI) have been de- data. This enables separation of concerns and del- ployed by public mapping agencies. But the migra- egation. (2) Annotations are pointing to explicit tion from local installations to Web-enabled infras- specifications, either captured in ontologies or in tructures has been (and still is) a tedious and cost- shared vocabularies encoded in a well-known for- Injecting Semantic Annotations 5 mat (e.g. in the Resource Description Framework service metadata. Depending on the client request, RDF). different sets of annotations may be added to the metadata. Experts coming from different informa- 3.1. Decoupling metadata tion communities are able to define their annota- tions, without risking conflicts with other anno- Separation of Concerns (SoC) refers to the idea tations. The presented implementation does not of separating distinct features in software appli- support the domain expert in the actual semantic cations. Typical concerns are concurrency, persis- annotation; this is expected to happen before the tence, or failure recovery [9]. In the case of meta- service is registered to SAPR. data, the already mentioned information about Clients for OGC Web services don’t separate data semantics, uncertainty, and trust could rep- between the location of the Web service and its resent the different concerns. Link Annotations description. Descriptions are typically generated loosely couple implementation-specific data mod- on the fly, since updates to underlying data sets els with application-specific (or domain-specific) are common. Each OGC Web service implements metadata. The client application has to specify the GetCapabilities-operation providing the ser- which annotations are to be injected. The pre- vice metadata. Supported operations and service- sented proxy generates a unique identifier for each specific information like feature types are listed set of annotations, and expects this identifier as in this document. W3C-compliant Web services, parameter to allocate the according annotations in on the other hand, support separation of descrip- the repository. tions from services. Best practice may imply that SoC supports delegation of the annotation from WSDL descriptions are provided by the services the data provider to the domain experts. Library (by concatenating the “?wsdl” to the URL). But Research has always faced the problem of the in- this is not part of the standard; SoC is thus in- formation gap between potential readers and in- herently supported by W3C Web services. How- dexing catalogers due to differing backgrounds [8]. ever, updating such service would require to lo- It is the librarian who is responsible for indexing cate and update all WSDL descriptions linking to a book for a library. She knows best how potential it. Hence, services typically have implementation readers might be looking for it, and what search and description at the same location, which again terms they might be using. The book’s author, raises the issue of SoC (and makes the presented on the other hand, might have had a very spe- proxy useful for semantic annotations). cific reader in mind; hence his description of book would be semantically narrow. Creating metadata 3.2. Semantics of Link Annotations for Web services is facing the similar issue: the data provider is an expert in data acquisition and The categorization of books in libraries is follow- authoring, but she might lack skills in identify- ing a well define scheme which has been adopted ing appropriate ontologies for semantic annota- by the popular standard for resources tions or the correct equations to describe the un- on the Web. Capturing the semantics of data, and certainty inherent in the data6. Furthermore, data describing the inner relationships of data entities, providers usually have very a specific end user in is considerably more complex. The Link Annota- mind when creating the metadata. It requires do- tion has been proposed as means to connect the main experts to close the information gap between individual elements in the data schema to the ap- the data providers and the potential users. It is the propriate ontology concepts in shared vocabular- domain expert who, on behalf of the data provider, ies. But the link itself does not indicate the nature identifies the appropriate shared vocabularies and of the linked resource, or how the client application creates the semantic annotations. has to process the annotations. The resource may The ad-hoc injection procedure allows for de- be just a different representation of the annotated coupling semantic annotations from the original item (the Instance Identification according to [1]). Or it provides information about it, e.g. how to use 6This discussion is focusing on the separation between it (the Aboutness). The SAWSDL standard pro- the two roles of the data provider and the domain expert. poses the ModelReference to add semantic anno- One person (or organization) can still play both roles. tations into XML schema or WSDL elements [14]. 6 Injecting Semantic Annotations

Its main purpose is the separation between two dif- proxy. All other requests - if supported by the se- ferent encodings of the same entity (even though lected HTTP method - are redirected to the prox- the standard itself is intentionally unconstrained). ied service. The client then directly interacts with The standard also recommends to complement the the original service to, for example, request the ac- reference with pointers to scripts which translate tual data. In all other cases (including invocation between the different encodings. In [20], we discuss requests using HTTP Post, which doesn’t support theDomainReference as extension to the Model- redirects) the proxy forwards the request to the Reference to align local implementation-specific services, and streams the result back. data models with globally shared domain models. The introduced proxy-based solution for in- Another option to let the client know about jecting annotations into Web service descriptions the nature of the referenced resource is to link has been implemented as a Web service itself. only to well-defined instances in a vocabulary. A The first URL in Figure 2 represents a typi- link pointing to a resource modelled as instance cal request for the capabilities description of an of a Person from the FOAF vocabulary [5] can be OGC Web service. Once registered to the proxy used to infer trust information (if it is injected in service, the semantically annotated description the appropriate location). Links to SKOS concepts can be retrieved using the second URL by using may be used to enhance discovery (e.g. brows- the original request parameter (in this case “re- ing through categories). A similar approach can quest=GetCapabilities”) and the service id 7.A be taken for place names by pointing to entities list of registered Web services and a list of all refer- served by gazetteers. Having all these different op- ences for one Web service could then be requested tions could potentially result in a plethora of dif- using the URLs (3) and (4). ferent kinds of Link Annotations with different identifiers. Avoiding this either requires an ontol- 4.1. Extracting the annotations ogy of Link Annotations or committing to one commonly used method. In the case of the lat- The following scenario assumes that the doc- ter, only the ModelReference from the SAWSDL ument with the annotations listed in Figure 1 standard may be used. The referenced resource is uploaded to the proxy using the manual up- then has to be encoded in a way to let common load form (URL (5) in Figure 2). The XML reasoning engines infer its type. For example, in- stream is forwarded to the reference extractor, stead of introducing a new reference for places which first identifies the service type by scanning (e.g. the http://.../PlaceReference), the Mod- the XML header. Each service type is configured elReference points to the OWL instance Paris, through an identification pattern (a regular ex- which is modelled as instance of the class City pression) and the patterns for the reference loca- (which itself is a sub-class of Place). tions as specified in the according metadata stan- dard. These locations are path expressions based on the XPath [2] syntax. In this example, the 4. The Semantic Annotations Proxy (SAPR) metadata is provided by an OGC WFS (version 1.1.0). The according pattern for the header is The concept of a network proxy has been around listed as (1) in Figure 3. Each of the three given for decades [26]. A proxy acts on behalf of the ser- patterns (e.g. .*Capabilities.*) has to be part vice it encapsulates by taking its identity. Clients of the document’s root element. For the WFS can access the original service only through the 1.1.0, the path expressions (2) and (3) in Fig- proxy; the proxied service stays hidden behind the ure 3 are configured. The first describes the lo- proxy. Clients are not aware of the existence of the cation of a modelReference (as defined in the proxy. Even though the service location changes SAWSDL standard) as attribute of a simpleType- (including the query part with the parameters), Element in XML schema. The second specifies the the client interacts with the proxy as with the MetadataURL in the FeatureType element as valid original service . This approach is similar to Web browsers accessing the Web through a proxy. The 7These are examples; the service identifiers change con- semantic annotations proxy only acts on a requests tinuously and the repository is purged often. Use URL (3) for metadata which has been registered to the for a list of valid services. Injecting Semantic Annotations 7

(1) http://www.example.com/wfs?request=GetCapabilities&... (2) http://semantic-proxy.appspot.com/api/get?request=GetCapabilities&...&sid=ae34aa09 (3) http://semantic-proxy.appspot.com/api/list/services (4) http://semantic-proxy.appspot.com/api/list/references?sid=ae34aa09 (5) http://semantic-proxy.appspot.com/html/upload.html

Fig. 2. Example URLs to access the Semantic Proxy destination. The nature of the reference is con- open set of parameters. The second URL in Fig- catenated, and used to identify the location in the ure 2 is an example how to request a WSDL-based original metadata document: child indicates that description of a Web service. The proxy first ex- the location of the discovered annotation should tracts the service identifier from the request and point to the parent element, i.e., the annotation retrieves the annotations as well as the original should later be injected as child of the given loca- service request from the cache (which is built ini- tion. In the case of attribute or sibling the full loca- tially from the data store on startup). If the ser- tion is stored; the metadata is then either injected vice id is not registered, an exception is returned as attribute into the element located by the path to the client. In the next step, the other request expression, or directly after it as new element. parameters (e.g. the request-parameter of an OGC The extraction and injection procedures make service) are extracted and compared to the param- use of the Streaming API for XML (STAX)8. The eter in the original service request. If they don’t extractor computes the path, implemented as a match, the client receives a redirection response stack of XML elements, for every element in the (HTTP Code 302) with the original location in the XML stream and computes the match with the HTTP header. If they match, the service meta- configured patterns (also represented in stacks). data is retrieved from its original source and for- The elements in the stacks are recursively com- warded to the injection engine. The XML stream pared. Namespaces of the individual elements are from the source is read element by element (us- ignored. In the case of a match, the location is ex- ing again StAX). Similar to the extraction proce- tracted and persistently stored together with the dure, a path expression is generated and matched reference. In Figure 3, Pattern (4) represents such against the registered annotations. In the case of a a location. In consists of the path expression de- match, the reference itself (which could be either scribing the location within the XML tree and attribute or a new XML element) is written into the location indicator (sibling, attribute, or child). the output stream. The resulting stream is directly This example states that the associated reference forwarded to the client. is injected after the element Name (since it is con- Any errors during this procedure, e.g. due to figured as sibling) with the text “Geol50k”. The missing parameters or parsing problems, cause ap- result of the extraction procedure is the service propriate HTTP status errors. It is the client’s re- identifier, which is required to retrieve the annota- sponsibility to communicate the error to the end tions. Each registration results in a new service id. user. Requests to the service which do not serve One document (with different sets of annotations) metadata (e.g. requesting the feature data from an can therefore be uploaded multiple times without OGC Web service) are redirected. risking conflicts. 4.3. Reflecting change 4.2. Injecting the annotations Changes in the original metadata (e.g. new fea- The injection procedure is illustrated in Figure ture types have been added) don’t affect the in- 4. The client software requests annotated meta- jection procedure as long as the stored reference data using the proxy, a service identifier, and an locations (the XPath statements) are still valid. The injection procedure works on the stream, and 8We use Woodstox, a “high-performance validating only stops if the XPath Stack at the current po- namespace-aware StAX-compliant (JSR-173) Open Source sition in the stream matches a XPath expression XML-processor”, see http://woodstox.codehaus.org/. stored with the annotations. New feature types, or 8 Injecting Semantic Annotations

(1) ".*Capabilities.*",".*xmlns.*http://www.opengis.net/wfs.*",".*version=\"1.1.0\".*" (2) //xsd:simpleType[@sawsdl:modelReference]/attribute() (3) //wfs:FeatureTypeList/wfs:FeatureType/wfs:MetadataURL/sibling() (4) //Capabilities/FeatureTypeList/FeatureType/Name/text()=\’Geol50k\’/sibling()

Fig. 3. Patterns for the extraction of annotations. new operations in a WSDL file, are ignored; pre- But annotating, for example, a process description viously existing metadata will still be annotated. coming from a WPS requires the injection of mul- If, however, the location of the stored XPath el- tiple lines of XML. ement changes (e.g., if the feature type which is SAPR additionally comes with a RESTful API annotated is renamed, and the name attribute is to interact (e.g. uploading, searching) with the reg- in the path), the annotation has to be updated istered services. The URLs (3), (4), and (5) in Fig- accordingly. ure 2 are examples. (3) can be used to list all ser- vices currently registered with SAPR. (4) lists ex- tracted semantic annotations for one service. The response, encoded in JSON, is a set of bindings between a location (the XPath-Expression) and a reference which has to be injected at this location (either an attribute or a whole chunk of XML). To register an annotated document, the file can be uploaded via the upload form available at (5). Clients are also able to directly register annotated metadata through the API.

5. Evaluation of the injection approach

The focus on the implementation is the dynamic injection of semantic annotations into structured metadata. It does neither contain an automatism how to create the annotations, nor do we add any functionality which makes use of them (e.g. in- clude reasoning on the semantic annotations). In the research project SWING [25], a visual inter- face has been implemented which supports the user in semantically annotating OGC Web Fea- ture Services [6]. The semantic annotations have Fig. 4. The injection procedure been used in SWING for the semantic discovery. In the GDI-Grid project [21], we implemented an The current implementation supports semantic Eclipse-based plugin for the semantic validation annotations for common Web services compliant of service compositions. The WSDL documents of to the W3C standards and described using the the Web services in the workflow were semantically WSDL standard. In addition, the OGC standards annotated using the SAWSDL standard. SAPR for Web Processing Services (WPS), Sensor Ob- has been initially developed in GDI-Grid to sup- servation Services (SOS), and Web Feature Ser- port the semantic validation. In the ENVISION vices (WFS) have been configured. Support for project [18], SAPR is one core component for new service types (or new extension points for se- building a semantically enabled infrastructure for mantic annotations) is added through XML-based designing and publishing environmental models as configuration files. Annotations are usually simple Web services. The originally desktop-based seman- attributes which are added to existing elements. tic annotation interface implemented in SWING Injecting Semantic Annotations 9 is currently migrated to the Web. It directly in- Working directly on the streams ensures very tegrates with SAPR, making it possible for even fast responses. The time difference between ac- non-ICT experts to semantically annotate Web cessing the meta-data from the original source or service metadata. Being a standard component for the annotated document from SAPR is negligi- the semantic enablement of existing service infras- ble, external factors, such as network latency or tructures is the long-term vision for the Semantic server response time, have a larger impact on re- Annotations Proxy. sponsiveness. Figure 5 depicts the average (50 runs SAPR has been deployed as a Software-as-a- each) response times (vertical axis) for request- Service (SaaS) using the Google App Engine9. ing the original metadata or the injected meta- This approach comes with some limitations on the data from the proxy (which includes the former). consumed resources, which have only a theoret- In particular the third (very slow) service illus- ical impact in the case of SAPR. The require- trates that other factors have a greater impact on ments for bandwidth, processing time, or storage the response then the proxy. In the fourth case (requesting one file from the SAWSDL-Testsuite), are negligible for injecting references into meta- the proxy request was consistently faster then the data streams. The benefits of cloud infrastructures original. This is probably due to a more efficient such as scalability, availability, and performance implementation of HTTP request handling within [4] outbalance the drawbacks caused by flexibility the Google App Engine SDK (since the tests ran constraints. Google App Engine manages load bal- on a local setup, any benefits from the proxy run- ancing by dynamically adding instances covering ning on the Google infrastructure are not reflected increased load. To further improve performance, in this chart). the internal caching service of the App Engine is The required support of different service meta- used for the retrieval of the annotations. SAPR is data standards has been thouroughly tested with a free Web service. In addition, it is free software module tests using appropriate test suites from and can be deployed in other locations as well. the various standards (e.g. the already mentioned SAWSDL Test Suite). But the strict dependence on standards is also a problem: Web services with invalid service descriptions are common, and far too often we experienced unexpected errors during the XML parsing. In addition, the slow response times can be a limiting factor within the Google App Engine. The API cuts off requests exceeding ten seconds response time, which is not uncommon for Web services running not within large-scale in- frastructures. SAPR has been implemented to be robust in respect to changes of the source meta- data. Changing the metadata does not affect the injection procedure, as long as no elements are re- moved which are used within the extracted XPath expressions.

6. Related Work

Early work on semantic annotations was focused on the manual semantic annotation of Web content [11]. Semantic annotations for Web services, and the concept of Semantic Web Services (SWS), have Fig. 5. Impact of the proxy on the response time. been introduced by [22] and [28]. This work was primarily concerned about the semantically en- 9Available at http://code.google.com/appengine abling W3C-compliant Web services, which even- 10 Injecting Semantic Annotations tually led to the development of the W3C recom- maining compliant to the standards, existing non- mendation for Semantic Annotations for WSDL semantic clients won’t be locked out. The proxy (SAWSDL) [14]. Enriching Geospatial Web ser- itself is not semantically enabled: it does neither vices with ontologies has been investigated by [15] enable users to create semantic annotations, nor and [12]. does it perform reasoning on the semantic annota- The long-term vision of SWS is the automatic, tions. The API supports registration of annotated reasoner-supported integration of Web services. documents and simple operations such as delet- Reasoning engines depend on specifications ex- ing or listing annotations. It does not support the pressed in a logic-based language, e.g. the Web discovery, since we assume that existing semanti- Ontology Language (OWL). Enabling semantics is cally enabled catalogues and service registries al- therefore the task of either directly creating Web ready perform this task. Registering a Web service service descriptions in such a language, or by link- semantically annotated with SAPR to such cata- ing existing XML-based descriptions to these on- logues then enables the semantic discovery of the tologies. The OWL-S Ontology (OWL for Services, annotations. [17] helps to (re-)model W3C-compliant Web ser- vices. The Web Service Modelling Ontology [24] Merging different annotations is not supported. (WSMO), and its recent descendant WSMO-Lite During the registration of annotated metadata, [13] are similar (but more sophisticated) solutions SAPR assigns a unique identifier (the SID). It is following the same approach. In this case, the later required to retrieve the annotated document, capabilities of a Web services, as well as its in- which could then, for example, be registered to formation model, are completely covered by the a concept-based search engine. A document reg- ontologies. In the long run, semantically enabled istered multiple times with different annotations Web services will directly deliver these ontolo- results in different SIDs, and consequently new gies (aligned to shared domain ontologies). XML- service metadata URLs. An URL pointing to the based metadata may then only be used to sustain same document (but having a different SID) with backwards-compatibility. data quality references would accordingly be used Coupling Semantic Web technologies with often in trust-aware IR systems. SAPR is only providing heterogeneous Spatial Data Infrastructures (SDI) means to inject annotations into legacy metadata. has been subject of various research [19,7,12,23, It is in the responsibility of the client application 15,25]. In [10], a semantic-enablement layer (SEL) to propagate the resulting new service metadata is proposed to complement existing SDI compo- links to appropriate applications like search en- nents with Web services for ontology access and gines. reasoning. SAPR can be considered as one impor- The presented approach targets XML-based tant step towards SEL, since it provides the link metadata of Web service descriptions. We have between existing spatial data services and the SEL neither addressed the actual XML data nor other components. content types not encoded in XML. The former is simply an issue of configuring the proxy accord- ingly. The mentioned example of injecting details 7. Conclusion of data quality into the actual data is one target application of SAPR. Injecting annotations into The vision of the Semantic Web relies on con- tent which is either directly semantically repre- non-XML based data was not covered, but is also sented (i.e., data encoded in an ontology language) required to integrate information hidden in raw or has been semantically annotated. Legacy in- data (e.g. sensor streams, satellite images, audio frastructures comply to well-established (and often files). Here, the annotations could be injected as domain-specific) standards. Here, semantic repre- additional fields into existing metadata (e.g. the sentations of the data would require updates to EXIF metadata for image files). the standards and accordingly the running infras- Dynamic injection relies on a reproducible way tructures. Injecting the semantic annotations with to identify the annotation’s location. Injecting an- SAPR is a non-intrusive solution to bring the ben- notations into unstructured data (i.e. data with- efits of semantics to these systems. And by re- out a structure compliant to a schema) is difficult Injecting Semantic Annotations 11 to achieve. and RDFa 10 are pro- also for applications with high demands for per- posed markup formats for XHTML, and therefore formance. means to annotate websites. The benefits of sepa- rating the annotations from the source data may also apply here, but are not realized in SAPR. 8. Acknowledgements SAPR is currently used in the ENVISION 11 The presented research has been funded by the project . In ENVISION we aim to semantically BMBF project GDI-Grid (BMBF 01IG07012, see enrich and chain environmental Web services, with http://www.gdi-grid.de) and the European re- the long-term goal to migrate existing environ- search project ENVISION (FP7-249170, see http: mental models into the Web. We have also re- //www.envision-project.eu). Contributions by 12 leased an API which supports the translation Carsten Kessler, Simon Scheider, and Mohammed of existing Web services into RDF-based service Bishr have been of great help. models. Ontology alignment tools allow for linking these service models to shared domain vocabular- ies. References to these service models are then References added to the service metadata and uploaded to [1] S. Bechhofer, L. Carr, C. A. Goble, S. Kampa, and SAPR. The resulting new locations are registered T. M. Board. The Semantics of Semantic Annota- to a search engine supporting concept-based and tion. In R. Meersman and Z. Tari, editors, Proceed- spatial query processing. In addition, we are cur- ings of on the Move to Meaningful Systems rently investigating the semantic annotation of bi- 2002: CoopIS, DOA, and ODBASE, volume 2519 of nary data formats such as NetCDF13, which are Lecture Notes in Computer Science, pages 1152–1167. Springer, Berlin / Heidelberg, 2002. common for environmental services. Future work [2] A. Berglund, S. Boag, D. Chamberlin, M. F. Fernán- will investigate how these formats can be sup- dez, M. Kay, J. Robie, and J. Siméon. XML Path ported in SAPR. Language (XPath) 2.0. W3C Recommendation, World In this paper an implementation for a decou- Wide Web Consortium (W3C), 2007. [3] S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, pled solution for semantic annotations of exist- Y. Avrithis, S. H, Y. Kompatsiaris, and M. G. Strintzis. ing Web services has been presented. The proxy- Semantic annotation of images and videos for multi- based approach enables semantically aware clients media analysis. In Proceedings of the Second Euro- to benefit from the injected semantic annotations pean Semantic Web Conference, ESWC 2005, Herak- into legacy metadata without touching the original lion, Crete, Greece, volume 3532 of Lecture Notes in Computer Science, pages 592–607. Springer, Berlin / services. The proxy acts on behalf of the hidden Heidelberg, 2005. Web service; the client does not have to manage [4] H. Erdogmus. Cloud Computing: Does Nirvana Hide the two separate sources. Requests are mediated behind the Nebula? IEEE Software, 26(2):4–6, 2009. and result either in updated metadata or redirects [5] M. Graves, A. Constabaris, and D. Brickley. FOAF: Connecting People on the Semantic Web. Cataloging to the original source. SAPR is used for the se- & classification quarterly, 43(3):191–202, Apr. 2007. mantic annotation of W3C- and OGC compliant [6] M. Grčar, E. Klien, and B. Novak. Using Term- Web services. The implementation was conceptu- Matching Algorithms for the Annotation of Geo- ally designed to be non-intrusive, making it possi- services. In B. Berendt, D. Mladenic, M. de Gemmis, ble to stay compliant to the underlying standards G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svátek, and F. Železný, editors, Knowledge Discovery En- for Web service descriptions. Finally, the stream- hanced with Semantic and Social Information, volume based injection approach and the deployment in 220 of Studies in Computational Intelligence, pages the cloud ensured a very responsive system useful 127–143. Springer Berlin/Heidelberg, 2009. [7] C. A. Henson, J. K. Pschorr, A. P. Sheth, and K. Thirunarayan. SemSOS: Semantic sensor Obser- 10RDAa is a W3C recommendation for annotating web- vation Service. In Proceedings of 2009 International sites. More information can be found here: http://www.w3. Symposium on Collaborative Technologies and Sys- org/TR/-in-html/ tems (CTS 2009), pages 44–53, Baltimore, MA, 2009. 11See http://www.envision-project.eu IEEE Computer Society. 12See http://kenai.com/projects/envision for the [8] F. Heylighen. and its Imple- source code mentation on the Web: Algorithms to Develop a Col- 13A description of this format is available at http://www. lective Mental Map. Computational & Mathematical unidata.ucar.edu/software/netcdf/ Organization Theory, 5(3):253–280, Oct. 1999. 12 Injecting Semantic Annotations

[9] W. Hürsch and C. V. Lopes. Separation of Concerns. [21] A. Padberg and C. Kiehle. Spatial Data Infrastruc- Technical report, Computer Science Dept., Boston, tures and Grid Computing : the GDI-Grid project. MA, 1995. In Proceedings of EGU General Assembly 2009, vol- [10] K. Janowicz, S. Schade, A. Bröring, C. Keß ler, ume 11 of Geophysical Research Abstracts, page 4242, P. Maué, and C. Stasch. Semantic Enablement for Vienna, Austria, 2009. European Geosciences Union. Spatial Data Infrastructures. Transactions in GIS, [22] M. Paolucci, K. Sycara, and T. Kawamura. Deliv- 14(2):111–129, Apr. 2010. ering Semantic Web Services. In G. Hencsey and [11] J. Kahan, M.-R. Koivunen, E. Prud’Hommeaux, and B. White, editors, Proceedings of the Twelfth Inter- R. Swick. Annotea: an open RDF infrastructure national Conference, pages 111–118, for shared Web annotations. Computer Networks, Budapest, Hungary, 2003. ACM. 39(5):589–608, Aug. 2002. [23] F. Reitsma and J. Albrecht. Modeling with the Seman- [12] E. Klien, D. Fitzner, and P. Maué. Baseline for Reg- tic Web in the Geosciences. IEEE Intelligent Systems, istering and Annotating Geodata in a Semantic Web 20(2):86–88, 2005. Service Framework. In S. I. Fabrikant and M. Wachow- [24] D. Roman, J. De Bruijn, A. Mocan, H. Lausen, icz, editors, Proceedings of 10th AGILE International J. Domingue, C. Bussler, and D. Fensel. WWW: Conference on Geographic Information Science, vol- WSMO, WSML, and WSMX in a nutshell. In Z. Shi ume 3 of Lecture Notes in Geoinformation and Car- and F. Giunchiglia, editors, The Semantic Web - tography, page 0, Aalborg, Denmark, 2007. Springer ASWC 2006, volume 4185 of Lecture Notes in Com- Berlin/Heidelberg. puter Science, pages 516–522, Beijing, China, 2006. [13] J. Kopecký and T. Vitvar. WSMO-Lite: Lowering Springer Berlin/Heidelberg. the Semantic Web Services Barrier with Modular and [25] D. Roman and E. Klien. SWING - A Semantic Light-Weight Annotations. In Second IEEE Interna- Framework for Geospatial Services. In A. Scharl and tional Conference on , volume 0, K. Tochtermann, editors, The Geospatial Web, Ad- pages 238–244, Santa Clara, CA, Aug. 2008. IEEE vanced Information and Knowledge Processing, chap- Computer Society. ter 23, pages 229–234. Springer London, 2007. [14] J. Kopecký, T. Vitvar, C. Bournez, and J. Farrell. [26] M. Shapiro. Structure and Encapsulation in Dis- SAWSDL: Semantic Annotations for WSDL and XML tributed Systems: the Proxy Principle. In Proceedings Schema. IEEE Internet Computing, 11(6):60–67, Nov. of the 6th International Conference on Distributed 2007. Computing Systems, pages 198–204, Cambridge, MA, [15] M. Lutz. Ontology-Based Descriptions for Semantic USA, 1986. IEEE Computer Society. Discovery and Composition of Geoprocessing Services. [27] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and GeoInformatica, 11(1):1–36, Mar. 2007. Y. Katz. Pellet: A practical OWL-DL reasoner. Web [16] C. C. Marshall. Toward an Ecology of An- Semantics: Science, Services and Agents on the World notation. In HYPERTEXT âĂŹ98: Proceedings of the Wide Web, 5(2):51–53, June 2007. ninth ACM conference on Hypertext and Hypermedia [28] K. Sivashanmugam, K. Verma, A. Sheth, and J. Miller. : Links, Objects, Time and Space - Structure in Hy- Adding semantics to web services standards. In L.- permedia Systems, pages 40–49, Pittsburgh, PA, 1998. J. Zhang, editor, Proceedings of the 1st International ACM. Conference on Web Services (ICWS’03), pages 395– [17] D. Martin, M. Burstein, D. McDermott, S. McIlraith, 401, Las Vegas, USA, 2003. CSREA Press. M. Paolucci, K. Sycara, D. McGuinness, E. Sirin, and [29] G. Stamou, J. van Ossenbruggen, J. Pan, and N. Srinivasan. Bringing Semantics to Web Services G. Schreiber. Multimedia Annotations on the Seman- with OWL-S. World Wide Web, 10(3):243–277, Sept. tic Web. IEEE Multimedia, 13(1):86–90, 2006. 2007. [30] K. Verma and A. Sheth. Semantically Annotating a [18] P. Maué and D. Roman. The ENVISION Environmen- Web Service. IEEE Internet Computing, 11(2):83–85, tal Portal and Services Infrastructure. In J. Hrebícek;, 2007. G. Schimak;, and R. Denzer, editors, Environmental [31] P. Vretanos. OpenGIS Web Feature Service (WFS) Software Systems. Frameworks of eEnvironment, vol- Implementation Specification. {OGC} implementation ume 359 of IFIP Advances in Information and Com- specification, Open Geospatial Consortium (OGC), munication Technology, pages 280–294, Brno, Czech 2005. Republic, 2011. Springer Berlin/Heidelberg. [32] M. Williams, L. B. D. Cornford, and B. Ingram. [19] P. Maué and S. Schade. Data Integration in the Describing and Communicating Uncertainty within Geospatial Semantic Web. Journal of Cases on Infor- the Semantic Web. In F. Bobillo, editor, Proceed- mation Technology (JCIT), 11(4):100–122, Oct. 2009. ings of the Fourth International Workshop on Uncer- [20] P. Maué, S. Schade, and P. Duchesne. Semantic Anno- tainty Reasoning for the Semantic Web, volume 423 tations in OGC Standards. {OGC} discussion paper, of CEUR Workshop Proceedings, Karlsuhe, Germany, Open Geospatial Consortium (OGC), July 2009. 2008. CEUR-WS.