RDF in the JRS Server
Total Page:16
File Type:pdf, Size:1020Kb
RDF in the JRS Server Simon Johnston, Martin Nally, Edison Ting May 16, 2008 This document describes the manner in which the JRS Server leverages, and exposes, RDF. JRS is not an RDF server, however it has made extensive use of RDF in the way in which it indexes resources and exposes indexed properties to client applications. This document covers not only the manner in which clients interact with this RDF view but the motivation for using RDF and some of the implementation details. 1 Motivation The Jazz REST Services (JRS) Server is a set of REST storage services that allow for the secure storage, indexing and query of resources supporting the development of Application Lifecycle Management (ALM) application clients. A JRS server should not make any restriction on the type or representation of resources1 stored, and yet indexing of resources does require knowledge of the “raw” format (XML vs. plain text for example) and specific domain schema (requirement XML vs. test case XML for example). To this end the workflow within JRS on the storage of resources is to invoke a set of indexer tasks that are raw format specific and which may be further configured with declarative rules for domain schema. For example the JRS server provides an image indexer that is able to extract EXIF2 properties from photographs and indexes a fixed subset, the XML indexer however must be configured by a client to extract specific elements and attributes using a set of XPath expressions. These indexer tasks are responsible for the extraction of properties from the resource which the JRS server then makes available for queries. In the design of the JRS server a number of particular needs were defined that effect the design of the indexer tasks and the server's persistence of index properties themselves. 1. The notion of an index property should be as simple as possible while still conveying useful information. 1We use the definitions of “resource” and “representation” from http://www.w3.org/TR/webarch/ (sec- tions 2.2 and 3.2) 2http://www.exif.org 1 2. Index properties should be retrievable for a given resource, that is the server should be able to answer the question “what index properties have been extracted from this resource”. 3. The representation of properties returned to a client should be in a standard format, if possible, rather than JRS inventing something new. 4. The server should support standard query languages that can operate on these index properties allowing client applications to perform complex queries. In the prototype server that pre-dates JRS the first concern above started us down a path of using key/value pairs and storing the value only as a string. This proved overly simple/restrictive firstly as we needed to ensure that properties could be correctly typed such that operators such as “>” and “<” would work correctly and also because in the case of XML resources some properties were attached to secondary resources3 within the resource and so needed to store this additional identifier. This led us to a design where we used the {subject, predicate, object} triple notion common in knowledge representation schemes, and an initial implementation known as the “universal table” where we stored all triples in a single database table. This single table led to terrible performance as any meaningful query resulted in many joins and simply did not scale. In JRS we changed the design such that each resource in the repository has it's own set of typed triples (types are limited to String, URI, Boolean, Integer and Timestamp). In the JRS Server it is possible to request a resource that contains the set of all index properties that are stored on a resource using the URI form “resource-uri?properties” where the query parameter “properties” will result in the return of a property document rather than the resource itself. It is also posible when executing a JRS simple query (that is a simple conjunctive, URL-encoded query) to request that not only the URL of a hit be returned but the properties of a hit also. This second form allows for client applications to query for resources and to return index properties such as name, content-type and so on allowing query results to be more appropriately rendered for a user. The initial format of this response was a custom format, which was unsatisfactory, and so we looked around for a more appropriate replacement. It seemed reasonable, as the internal format of the indexed properties was very much inspired by RDF4 that the RDF XML format be used in the representation of properties in queries described above. To this end we described a set of RDF generation patterns (described below) that the JRS server would use so that all our property documents would be returned in a regular manner. This move to an RDF XML representation also spurred us to remove the custom names we had used for certain system maintained properties, using a mixture of RDF and Dublin Core5, properties instead. The last step was to decide how to make these indexed properties more easily accessible and available for query by client applications. The resulting decision was to generate a 3See http://www.w3.org/TR/webarch/ section 3.2.2 4http://www.w3.org/RDF/ 5http://dublincore.org/documents/dcmi-terms/ 2 query service provided by JRS that allowed client applications to POST a query, in a standard query language, to the server and have it run against an “RDF Store”, that is against some collection of all the possible RDF documents used to describe the resources in the JRS repository. The key here is that the implementation of JRS could provide multiple such query languages and can choose to actually store the RDF as XML data or not; the details are not client visible. The benefit to the client is that the shape of this RDF store and the behavior of the query service are defined by the JRS specifications and so how the query service is implemented is immaterial as long as it conforms to these behavioral requirements. Putting all of these together we discovered that RDF has now taken a central role in our design; ANY resource stored in JRS regardless of it's representation has an associ- ated set of index properties (even if these are only the system defined properties) and these properties are accessible though the “?properties” request, the URL-encoded query and the RDF store query. As we expect client applications to make extensive use of links between resources and to define “virtual collections” of resources through standard queries, understanding these services are key to the development of JRS client applica- tions. The rest of this document describes the format of properties, the property and query APIs and finally some of the details of the current JRS implementation that uses DB2 PureXML. 2 Indexing and Properties This section describes the set of generation patterns used to express JRS indexed prop- erties in RDF. The reason for this document is to constrain the use of RDF to a set of patterns that can be expected by clients and therefore reduce the parsing re- quirements. All examples have been tested against the W3C (RDF Validation Service http://www.w3.org/RDF/Validator/) and written as a set of test cases with RDFLib http://rdflib.net. The examples in this section show indexes extracted from the more complete music example presented later in this paper. 2.1 General Patterns The following describe common patterns derived from patterns seen in indexes them- selves. 2.1.1 Empty Set of Index Properties The following is the representation returned for resources that either have not been indexed or for which no indexer contributed properties. This is the simplest form of RDF description document. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="/jazz/resources/musicdb/albums/album-1"/> </ rdf:RDF > 3 Note that the rdf:RDF element is optional, according to RDF/XML Syntax Specifi- cation http://www.w3.org/TR/rdf-syntax-grammar/ section 2.6, we would prefer not to require this and so clients should not expect to receive the outer RDF tag. Users should note however that some validators, including the W3C online validation service DO require the outer RDF tag. To create a complete RDF/XML document, the serialization of the graph into XML is usually contained inside an rdf:RDF XML element which becomes the top-level XML document element. Conventionally the rdf:RDF element is also used to declare the XML namespaces that are used, although that is not required. When there is only one top-level node element inside rdf:RDF, the rdf:RDF can be omitted although any XML namespaces must still be declared. Therefore the basic form would be simpler, as in the following listing. <rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdf:about="/jazz/resources/musicdb/albums/album-1"/> 2.1.2 Simple Index Properties Once index properties have been stored for a resource, we see these encoded as true XML elements with the predicate name and namespace extracted by the indexer. The difference between this format and the current JRS indexer is that RDF makes a conscious and semantic distinction between literal values and resource links and this is reflected in the format as shown below. This form uses the RDF convention that stores literal values as element content but references to other resources with the URL as an attribute of the element (seen here in the difference between ns:name and ns:artist).