Resource-Level Versioning in Administrative Geography RDF Data
Total Page:16
File Type:pdf, Size:1020Kb
Resource-Level Versioning in Administrative Geography RDF Data Alex Lohfink, Duncan McPhee University of Glamorgan, faculty of Advanced Technology, Dept. of Computing and Maths, Pontypridd, CF37 1DL Tel. +44443 482950 (alohfink, dmcphee)@glam.ac.uk ABSTRACT The Resource Description Framework (RDF) is a base technology of the semantic web, providing a web infrastructure that links distributed resources with semantically meaningful relationships at the data level. An issue with RDF data is that resources and their properties are subject, as with all data, to evolution through change, and this can lead to linked resources and their properties being either removed or outdated. In this paper we describe a simple model to address such an issue in administrative geography RDF data using RDF containers. The model version-enables such data at the resource level without disturbing the inherent simplicity of the RDF graph, and this translates to the querying of data, which retrieves version-specific properties by matching simple graph-patterns. Further, both information and non-information resources can be versioned by utilizing the model, and its inherent simplicity means that it could be implemented easily by data consumers and publishers. The model is evaluated using some UK administrative geography RDF datasets. Keywords: Key words: administrative geography; semantic web; versioning; RDF; linked data Page 1 1. Introduction as single entities. The model does not The publication of data in Linked Data interfere with the inherent simplicity of the (RDF) form has increased significantly graph-based structure imposed by the RDF recently and in the UK this process has been data model, and this simplicity is evident accelerated through the Government‟s when formulating queries about versioned „Making Public Data Public‟ initiative, resources, where standard SPARQL (W3C which is encouraging Government Agencies 2008) queries can be used to retrieve to publish data in linked form. However, no versioned data by matching simple graph real solution currently exists to enable data patterns. We believe the model offers the to be versioned other than at dataset level, potential to provide a practical and scalable and being able to version at the resource solution to versioning in Linked Data level is likely to provide significant datasets whilst retaining sufficient simplicity advantages to both publishers and to be implemented by data publishers and consumers of Linked Data by facilitating consumers. It also has the potential to be access to different versions of specific data. extended to incorporate other issues in resource-level versioning such as ambiguity The Ordnance Survey1(OS) have published (where two or more resources share the its UK administrative geography data in same name) and variance (different RDF format, and an issue that has arisen representations of the same resource). with this data is that the administrative units The rest of this paper is organised as represented change frequently (boundaries, follows: in the next section we describe the for example, are released twice a year), but problem which our model addresses and these changes are not adequately represented previous work, in section 3 we describe our in the data as there is no logical organisation model, and in section 4 we describe an between different versions of units. Because implementation and evaluation of versioned of this, different users of the administrative datasets based on our. Finally, in section 5 geography datasets could link to different we present our concluding remarks. versions of administrative units represented in whichever version of the data they are 2. Previous work accessing, meaning that inconsistencies will Versioning for RDF can be viewed from two be apparent between different RDF datasets. perspectives: web ontology versioning There is also a requirement for applications (classes) and instance versioning. Instance to link to other versions of a resource that versioning can be further divided into may not necessarily be the latest version. In model-based, statement-based (triple), or this paper we describe a mechanism that resource-level versioning. Model-based could solve this problem by version- versioning applies to a group of triples that enabling RDF administrative geography data form part of a logical unit. Statement-based at the resource level. This has been in the versioning applies to individual statements form of a practical and simple Linked Data (triples), while resource level (or datum- model that can potentially provide solutions level) versioning applies to the versioning of to the issues described here. The model uses individual resources within an RDF graph. It RDF container and collection elements to is within the field of resource-level represent versioned resources within a RDF versioning that this paper is primarily graph, representing collections of resources concerned. 1 This research is sponsored by Ordnance Survey Page 2 Previous work in the field of versioning in in place of URIs. This concept is taken Linked Data has been centred mainly on further in the EvoPat system (Christoph web ontology versioning (versioning of RieB, Norman Heino et al.) where evolution classes), and on the managing of change in patterns are applied as a form of „software RDF graphs, and to a lesser degree on refactoring‟. Evolution patterns can be basic, resource-level versioning. The SemVersion accommodating atomic change, or (Volkel 2005) model focuses on managing compound, where atomic changes or other change in ontologies where users can compound evolution patterns are sequenced suggest different classes to include in the to represent complex changes in ontology. A ontology. SemVersion can manage such basic evolution pattern is composed of a changes and reconcile them into a new SPARQL select query and a SPARQL version of the ontology. SemVersion update query (this uses a modified SPARQL employs model-based versioning. Delta query processor). These are applied (Berners-Lee and Connolly 2001) is a according to a library of pre-defined system designed to identify differences patterns. Another version model described between RDF graphs, and uses functions to by (Ludwig, Kuster et al. 2008) uses an compute these differences. Differences extension to the Topic Maps data model between graphs are produced in the form of (ISO 2008) to potentially implement a delta which represents the changes only. versions in RDF. Topic Maps represent This means that a delta derived from a topics (or subjects), attributes, and knowledge base can be applied to a subset of associations as an entity-relationship model. this knowledge base and update it, with This model uses a structure called the accurate results. RDF Difference Models VersionInfo Object, or VIO, to record start (deVos 2002) also uses groups of triples to and end dates for a specific version of a describe changes that have occurred topic map object. The model is stated as between two graphs, but goes further in being applicable to RDF triples by grouping specifying “forward difference statements” triples into logical units and linking them to and “reverse difference statements”, and a VIO, although no example or evaluation of adds preconditions that must be present in this technique is specified. Changesets (Talis both the current graph and the difference 2011) is a resource-centric approach, and statements before the desired version can be uses RDF reification to describe changes to derived. This can be used as a type of a resource. A Changeset applies to a named version management or locking. RDFSync resource, and describes triples that are added (Tummarello, Morbidoni et al. 2007) and removed in the updated resource. The provides an algorithm to synchronise locally Changeset can also describe the reason for held RDF graphs with updates to linked the change. This is a powerful representation RDF datasets across a network, but although but is complex, requiring a large number of updates are preserved there is no access to triples to represent a single change, and previous versions. Evolution patterns are would be correspondingly difficult to query. used by (Soren Auer and Herre 2005) to The UK Government data developers represent changes in ontologies. Here types (Tennison 2010; HMGovernment 2011) use of changes that can occur in a RDF graph named graphs to record a set of changes to a are defined and applied to ontology group of resources. (Named graphs are evolution by the specification of graph groups of triples that share a common patterns representing the change. Graph identifier). Each named graph links to what patterns are graphs where variables appear it replaces. A set of changes is therefore a Page 3 set of named graphs, and a dataset is the implementation is specified, and it is not result of an RDF merge of the individual clear how alternatives (that is versions graphs. HTTP content negotiation has been where the representations are not necessarily described by (Sompel, Sanderson et al. time dependent) would be handled. It also 2010) as a method to represent versioned organises VIOs according to a sequence, the resources, where the default linkage is organisation of which is not specified. always the current version. Previous ChangeSets focus on versioning at the versions are timestamped and accessed by resource level, but achieves this by specifying a time in the HTTP request, using recording different graph configurations a timegate, which supports date-time content representing the resource‟s properties for a negotiation. given version, and does not provide direct access or linkage to previous versions of a Delta, SemVersion, and RDF Difference resource. Also the technique of this models are aimed at managing change to approach and that of RDF Difference web ontologies or managing and reconciling Models, that of using collections of triples to differences between RDF graphs, rather than represent changes in particular graphs, is addressing the need to be able to reference verbose, leading to more statements being different versions of the same statements or added to the graph than the triples being resources within the same RDF dataset.