Resource-Level Versioning in Administrative Geography RDF Data

Alex Lohfink, Duncan McPhee University of Glamorgan, faculty of Advanced Technology, Dept. of Computing and Maths, Pontypridd, CF37 1DL Tel. +44443 482950 (alohfink, dmcphee)@glam.ac.uk

ABSTRACT The Resource Description Framework (RDF) is a base technology of the semantic web, providing a web infrastructure that links distributed resources with semantically meaningful relationships at the data level. An issue with RDF data is that resources and their properties are subject, as with all data, to evolution through change, and this can lead to linked resources and their properties being either removed or outdated. In this paper we describe a simple model to address such an issue in administrative geography RDF data using RDF containers. The model version-enables such data at the resource level without disturbing the inherent simplicity of the RDF graph, and this translates to the querying of data, which retrieves version-specific properties by matching simple graph-patterns. Further, both information and non-information resources can be versioned by utilizing the model, and its inherent simplicity means that it could be implemented easily by data consumers and publishers. The model is evaluated using some UK administrative geography RDF datasets.

Keywords: Key words: administrative geography; semantic web; versioning; RDF; linked data

Page 1

1. Introduction as single entities. The model does not The publication of data in Linked Data interfere with the inherent simplicity of the (RDF) form has increased significantly graph-based structure imposed by the RDF recently and in the UK this process has been data model, and this simplicity is evident accelerated through the Government‟s when formulating queries about versioned „Making Public Data Public‟ initiative, resources, where standard SPARQL (W3C which is encouraging Government Agencies 2008) queries can be used to retrieve to publish data in linked form. However, no versioned data by matching simple graph real solution currently exists to enable data patterns. We believe the model offers the to be versioned other than at dataset level, potential to provide a practical and scalable and being able to version at the resource solution to versioning in Linked Data level is likely to provide significant datasets whilst retaining sufficient simplicity advantages to both publishers and to be implemented by data publishers and consumers of Linked Data by facilitating consumers. It also has the potential to be access to different versions of specific data. extended to incorporate other issues in resource-level versioning such as ambiguity The Ordnance Survey1(OS) have published (where two or more resources share the its UK administrative geography data in same name) and variance (different RDF format, and an issue that has arisen representations of the same resource). with this data is that the administrative units The rest of this paper is organised as represented change frequently (boundaries, follows: in the next section we describe the for example, are released twice a year), but problem which our model addresses and these changes are not adequately represented previous work, in section 3 we describe our in the data as there is no logical organisation model, and in section 4 we describe an between different versions of units. Because implementation and evaluation of versioned of this, different users of the administrative datasets based on our. Finally, in section 5 geography datasets could link to different we present our concluding remarks. versions of administrative units represented in whichever version of the data they are 2. Previous work accessing, meaning that inconsistencies will Versioning for RDF can be viewed from two be apparent between different RDF datasets. perspectives: web ontology versioning There is also a requirement for applications (classes) and instance versioning. Instance to link to other versions of a resource that versioning can be further divided into may not necessarily be the latest version. In model-based, statement-based (triple), or this paper we describe a mechanism that resource-level versioning. Model-based could solve this problem by version- versioning applies to a group of triples that enabling RDF administrative geography data form part of a logical unit. Statement-based at the resource level. This has been in the versioning applies to individual statements form of a practical and simple Linked Data (triples), while resource level (or datum- model that can potentially provide solutions level) versioning applies to the versioning of to the issues described here. The model uses individual resources within an RDF graph. It RDF container and collection elements to is within the field of resource-level represent versioned resources within a RDF versioning that this paper is primarily graph, representing collections of resources concerned.

1 This research is sponsored by Ordnance Survey

Page 2

Previous work in the field of versioning in in place of URIs. This concept is taken Linked Data has been centred mainly on further in the EvoPat system (Christoph web ontology versioning (versioning of RieB, Norman Heino et al.) where evolution classes), and on the managing of change in patterns are applied as a form of „software RDF graphs, and to a lesser degree on refactoring‟. Evolution patterns can be basic, resource-level versioning. The SemVersion accommodating atomic change, or (Volkel 2005) model focuses on managing compound, where atomic changes or other change in ontologies where users can compound evolution patterns are sequenced suggest different classes to include in the to represent complex changes in ontology. A ontology. SemVersion can manage such basic evolution pattern is composed of a changes and reconcile them into a new SPARQL select query and a SPARQL version of the ontology. SemVersion update query (this uses a modified SPARQL employs model-based versioning. Delta query processor). These are applied (Berners-Lee and Connolly 2001) is a according to a library of pre-defined system designed to identify differences patterns. Another version model described between RDF graphs, and uses functions to by (Ludwig, Kuster et al. 2008) uses an compute these differences. Differences extension to the Topic Maps data model between graphs are produced in the form of (ISO 2008) to potentially implement a delta which represents the changes only. versions in RDF. Topic Maps represent This means that a delta derived from a topics (or subjects), attributes, and knowledge base can be applied to a subset of associations as an entity-relationship model. this knowledge base and update it, with This model uses a structure called the accurate results. RDF Difference Models VersionInfo Object, or VIO, to record start (deVos 2002) also uses groups of triples to and end dates for a specific version of a describe changes that have occurred topic map object. The model is stated as between two graphs, but goes further in being applicable to RDF triples by grouping specifying “forward difference statements” triples into logical units and linking them to and “reverse difference statements”, and a VIO, although no example or evaluation of adds preconditions that must be present in this technique is specified. Changesets (Talis both the current graph and the difference 2011) is a resource-centric approach, and statements before the desired version can be uses RDF reification to describe changes to derived. This can be used as a type of a resource. A Changeset applies to a named version management or locking. RDFSync resource, and describes triples that are added (Tummarello, Morbidoni et al. 2007) and removed in the updated resource. The provides an algorithm to synchronise locally Changeset can also describe the reason for held RDF graphs with updates to linked the change. This is a powerful representation RDF datasets across a network, but although but is complex, requiring a large number of updates are preserved there is no access to triples to represent a single change, and previous versions. Evolution patterns are would be correspondingly difficult to query. used by (Soren Auer and Herre 2005) to The UK Government data developers represent changes in ontologies. Here types (Tennison 2010; HMGovernment 2011) use of changes that can occur in a RDF graph named graphs to record a set of changes to a are defined and applied to ontology group of resources. (Named graphs are evolution by the specification of graph groups of triples that share a common patterns representing the change. Graph identifier). Each named graph links to what patterns are graphs where variables appear it replaces. A set of changes is therefore a

Page 3 set of named graphs, and a dataset is the implementation is specified, and it is not result of an RDF merge of the individual clear how alternatives (that is versions graphs. HTTP content negotiation has been where the representations are not necessarily described by (Sompel, Sanderson et al. time dependent) would be handled. It also 2010) as a method to represent versioned organises VIOs according to a sequence, the resources, where the default linkage is organisation of which is not specified. always the current version. Previous ChangeSets focus on versioning at the versions are timestamped and accessed by resource level, but achieves this by specifying a time in the HTTP request, using recording different graph configurations a timegate, which supports date-time content representing the resource‟s properties for a negotiation. given version, and does not provide direct access or linkage to previous versions of a Delta, SemVersion, and RDF Difference resource. Also the technique of this models are aimed at managing change to approach and that of RDF Difference web ontologies or managing and reconciling Models, that of using collections of triples to differences between RDF graphs, rather than represent changes in particular graphs, is addressing the need to be able to reference verbose, leading to more statements being different versions of the same statements or added to the graph than the triples being resources within the same RDF dataset. described. The named graphs technique used RDFSync is able to handle updates to RDF by UK Government data developers graphs and synchronise locally help graphs versions at the graph level, and even at the across a network, but provides no capability dataset level, and has no capability to handle to store or retrieve previous versions of data. resource-level versions. This means that The evolution patterns approach is again current RDF graphs about a particular study aimed specifically at ontology evolution and area are derived by merges, and direct attempts to use patterns to interpret structure linkage to previous/other versions is not and therefore define change. However, this possible. The model described by (Sompel, contradicts the inherent, semi-structured Sanderson et al. 2010) versions at the nature of the RDF data model, and relies on resource level, and has the advantage that mathematical definitions of change. The the default URI is always the latest version. implementation of evolution patterns in the However, this would be problematical if EvoPat system provides a practical linkage was required to a non-default implementation of evolution patterns and version. Also, this model only distinguishes claims to be relatively simple, but it relies versions using date-time values, and would on a modified SPARQL query engine and a therefore not handle versions not pre-defined catalog of patterns, which could distinguished by time properties. To our easily be problematical if this was not knowledge no model can adequately applicable to a particular domain. The model represent versions at the resource level, and described by (Ludwig, Kuster et al. 2008) provide direct linkage or access to versions attempts to provide a method for versioning other than the latest version, whilst retaining resources, by linking logical units of triples the inherent simplicity of the RDF data to VIOs. In this case, the suggestion is that model. the VIO would contain start and end dates relevant to the statements in the referenced logical unit, in effect extending the graph to 3. Versioning administrative incorporate versioning objects. However, no geographic data

Page 4

Inferencing The data used of the basis of our model were OS UK Administrative Geography data RDFS allows inferencing based on defined containing information resources describing properties such as subClassOf and administrative units (such as boroughs, subPropertyOf . At the simplest level, this regions, and districts) and their associated provides a mechanism to create a version properties (such as areas, identification hierarchy based on inheritance, where new numbers, and topological relationships to versions of an item are defined using the other administrative units). Consequently, subClassOf (or „is_a‟) relationship. our model aimed to be applicable to these Inferencing allows RDF to infer from the data and also fulfill our objectives in subClassOf relationship that the resource is versioning resources. These objectives were a member of the superclass. For example, In as follows: Ordnance Survey Administrative Geography data, a Civil Parish is defined as a  The data should provide easy access for subClassOf a Civil Administrative Area. It consumers who are only interested in the can thus be inferred that the Civil Parish of latest or most recent version of a Chelmsley Wood is also a Civil resource. Administrative Area. This feature provides  Previous versions of resources can be type propagation, and could be used to easily accessed and queried using define a version hierarchy of RDFS classes. standard SPARQL queries.  Non-information resources can be It is also possible with some RDF incorporated into the model if required implementation environments to define and similarly accessed and queried inferencing rules. This kind of inferencing goes beyond the scope of the RDFS inferencing capabilities, and allows the definition of specific, text-based rules by which implicit relationships can be inferred. To achieve these objectives we intended to This could allow version-specific rules to devise an appropriate model using existing enhance a version hierarchy, such as rdf elements defined by (W3C 2004) and version_of, derivative_of, alternative_to based on the example dataset. Our sample based on criteria derived from the data contained two versions of a RDF differences between versions of an dataset, the most recent data containing administrative unit. resources with newly minted URIs.

There are several mechanisms and features Named graphs within RDF and RDF Schema (RDFS (W3C 2004), the specification of the classes and Named graphs allow groups of triples to be properties of RDF) that offer the potential to identified as belonging to a specified graph model resource-level versions in within a larger RDF graph, and as described administrative geography data, and provide in the previous section, have been employed the features of linkage to default or by (HMGovernment 2011) to version at the alternative versions that may or may not be graph and dataset level. This is achieved by time dependant. The following were tagging the triples with an identifier that considered. specifies the named graph to which it is associated, in effect making the triples

Page 5

“quads”. This means that a group of triples version graph. RDF containers and could be coupled together as a named collections, on the other hand, provide a version graph. The RDF query language, simple mechanism that integrates seamlessly called the SPARQL protocol and RDF query into an RDF graph without disturbing its language (SPARQL (W3C 2008)), has a simplicity, and version-specific properties FROM NAMED clause which can query can be defined within this structure. named graphs. Containers and collections also provide flexibility in the mode of the representation. For example, containers can be defined as RDF containers and collections rdf:Alt, rdf:Bag, or rdf:Sequence. Rdf:Alt denotes that the contained resources are RDF containers and collection elements alternatives, rdf:Bag denotes that there is no provide the capability to represent significance to the order in which the collections of resources as single entities, contained resources are represented, and and link to them forming new triples. This rdf:Sequence denotes that the contained means that it is possible to model resources are set in sequential order. (It relationships between multiple versions of should be noted that there is no implicit data by defining appropriate RDF container behaviour associated with any of these or collection classes. An RDF container containers, and that these conventions exist simply uses a blank node from which to link to provide consistency in their use.) resources that belong to that container. Of Containers and collections may also be particular interest here is the rdf:Alt nested. Further, the simplicity offered by the container, which is used to describe a list of use of containers and collections should be alternative values of a resource. reflected in the queries used in retrieving version-specific properties. Also, this It is our contention that of these three simplicity could offer the possibility of take- mechanisms, RDF containers and up by data consumers as its implementation collections provide the most appropriate should be less complex than the methods structures to represent versions. RDFS discussed earlier. inferencing provides a simple mechanism to deduce versions, but does not provide any version-specific relationships between Information and non-information versioned resources. Rule-based inferencing, resources on the other hand, is mainly aimed at getting more meaning from existing relationships The type of resource being versioned, between resources, and would require the specifically whether it is an information definition of specific conditions upon which resource or a non-information resource, has inferences could be made, which would not an impact on how the resource is updated. be expressed within triples. Named graphs For information resources, a new URI can disturb the inherent simplicity of the triple be minted for the new version, preserving by tagging each one with an identifier to the URI of the replaced version, which can identify it with a particular graph, and would still be linked to. Alternatively the resource also require some kind of logical naming can be updated and keep its existing URI, convention to facilitate querying. In the previous version being lost (destructive addition, relationships between versions update). Non-information resources usually within the named graph would still need to imply a finer granularity in the extent of the be defined, negating the need to name the change ((Tennison 2010) provides the

Page 6 example of school statistics where class rdf: http://www.w3.org/1999/02/22-rdf- sizes, and staff change frequently) and so syntax-ns# require special provision. (Tennison 2010) also concludes that timestamping individual The versioned resource has the URI property versions and holding them in a http://data.ordnancesurvey.co.uk/id/700000 multi-value element becomes prohibitively 0000010769, and the property rdfs:label complex. We, therefore, have taken a with the value of The London Borough of broader approach to modeling the versioning Bexley. The property dcterms:hasVersion of both information and non-information links the versioned resource to the rdf:Alt resources to negate this complexity, with container, specifically a blank node defined some notable compromises. More as type rdf:Alt. The property rdf:_1 of this specifically, our approach does not explicitly container specifies the alternate version for distinguish between information resources the The London Borough of Bexley resource, and non-information resources. Rather, it which has the URI distinguished between versions where the admingeo:osr7000000000010759. By the new version has a newly minted URI, and convention defined by (W3C 2004), all versions where the new version retains the container members are identified by the existing URI. properties rdf:_1, rdf:_2, and so on, but the significance of the ordering is defined by the Versioning resources where the container type, as specified in the previous new version has a newly minted URI section. Further versions would be represented using such properties. The The datasets used provided newly minted property dcterms:isVersionOf gives the URIs for the updated resources. According reverse property, showing which resource to our criteria for versioning, the model must this resource is a version of. The versioned provide easy linkage or access to the latest resource is given the property version of the resource. In this case we used dcterms:isPartOf to show which an rdf:Alt container to represent versions of administrative authority this version was/is a a resource, and used Dublin Core part of. In this example, it denotes that this (DCMI 2010) version properties to establish version of The London Borough of Bexley is version- specific properties. The use of the part of the Greater London authority which rdf:Alt container to represent versioned has the URI information resources within an RDF graph admingeo:osr7000000000041441. is shown in Figure 1. The model shown Although in the example OS data used in represents two versions of a resource this work the alternative version of The described in two separate RDF datasets London Borough of Bexley is part of the produced by the OS. The diagram uses the same Greater London authority, the model following prefixes: allows for a version to be part of a different administrative unit. dcterms: http://purl.org/dc/terms/ admingeo: http://www.ordnancesurvey.co.uk/ontology/ AdministrativeGeography/v2.0/Administrati veGeography.rdf# rdfs: http://www.w3.org/2000/01/rdf- schema#

Page 7

used as the dcterms:hasVersion property of the resource (admingeo:7000000000010759), with a nested rdf:Collection element as its first (in this case only) property representing a multi- value collection of properties and their values.

Figure 1RDF graph showing the use of the rdf:Alt container to represent two versions of the resource called 'The London Borough of Bexley'

It can be seen that date values have been linked to each administrative unit resource using the property dcterms:date. There is currently little consistency or agreement on the definition of date values that are Figure 2 RDF graph showing the use of the included in OS administrative data. The rdf:Alt container with nested rdf:Collection values provided here represent the dates on element to represent versioned properties of which the datasets were first assembled. a resource Boundary lines for administrative units are typically released twice a year, although it For clarity, only a representative subset of can be seen that there is seventeen months the properties of the resource is shown in the between our two datasets. This is in part due diagram. to the absence of a suitable mechanism to The property dcterms:hasVersion links to represent versions. the rdf:Alt container, which contains all the resource‟s previous versions of its properties. The versioned properties in this Versioning resources where the example are represented by the rdf:_1 new version retains the existing URI element. It can be seen that the non- information properties dcterms:date and Although not present in our example data, admingeo:has Area are different versions new versions which retain the URI of the from the parent version. The parent resource can be represented by the use of properties admingeo:hasAreaCode and nested rdf:Collection elements. Figure 2 admingeo:hasUnitID are not present in the shows such an example. The same prefixes version as these were newly added in the are used as in the previous example. The parent. This highlights a restriction in the figure shows of an rdf:Alt container being model: there is no recording of any data to

Page 8 determine properties that have been added or 2. Iterate over the update dataset URIs and removed, nor is there data concerning how identify resources of type or why the change occurred. The LondonBorough representation does offer, however, simple 3. Create the appropriate mulit-value rdf querying. An alternative model would be to element (either rdf:Alt or rdf:Collection) represent only those properties that exhibit for each base dataset resource URI that change in the versioned properties, but this has a matching ID in the update dataset would make querying at best far more URI. complex (and possible impossible), although 4. Update the base dataset with the newly the forthcoming SPARQL 1.1 (W3C 2011) created container. may mitigate this. The datasets were implemented in an AllegroGraph (Franz 2009) triple store, chosen for its strong Java and Python APIs 4. Implementation and the strength of its documentation. It also has a triplestore browser that allows the To evaluate the version model we examination and verification of implemented triplestores containing implemented triplestores, which will be used versioned resources based on the models to show versioned resources in the versioned described in the previous section. Two such datasets. triplestores were implemented, one for resources where the new version has a Figure 3 shows the resultant structure of a newly minted URI, and one for resources versioned information resource in the where the new version retains the URI of the implemented dataset. parent version. A record linkage algorithm was developed to generate the version- specific RDF links. The datasets to be produced were formed by merging versions from the two example datasets: the most recent dataset (referred to henceforth as the base dataset) comprised of UK administrative geography data from September 2009, and the second dataset (henceforth referred to as the update dataset) comprised of UK administrative geography data from April 2008. As stated previously, the date values refer to the dates on which the datasets were first compiled, and will be Figure 3 Screenshot of a versioned represented in the versioned resources by a resources with distinct URIs as viewed in dcterms:Created property. the triplestore browser

The algorithm followed these stages:

1. Iterate over the base dataset URIs and identify resources of type LondonBorough

Page 9

Table 1 shows the results we obtained. The merge times of over an hour show that the implementation would get considerably time consuming in datasets containing millions of triples, although this could be mitigated by more regular and/or phased updates. Also, the update where the new versions retain the existing URI doubles the size of the store, which might be a problem with large datasets. Figure 4 Screenshot of a versioned resource where the URI is retained as viewed in the triplestore browser Table 1 Merge times for combining the two datasets into one dataset containing Although our example datasets comprised of versioned resources resources that had new URIs for the different versions, as a proof of concept we Update type Base size Resources Merged Merge implemented the two datasets as versioned (triples) to update size time (triples) (secs) resources that retained the same URI in both versions, according to our model outlined in URIs 150k 11,761 207k 4017.94 distinct in section 3. In this case we attached the both properties and their values of the version to versions the nested rdf:Collection element URI 150k 11,761 325k 4043.35 representing the rdf:_1 property of the retained by both hasVersion property of the parent version. versions Figure 4 shows a versioned resource as viewed in the triplestore browser. Some of the properties have been omitted for clarity. It can be seen that the hasArea and Created Querying data in the versioned properties are different than in the parent datasets version, but the type (LondonBorough) and hasCensusCode vales are duplicated, Querying the dataset demonstrates some of illustrating the limitation of the model. the proposed benefits of the model. Firstly, because the model uses standard RDF containers, the implemented data could be Potential to scale queried using standard SPARQL queries. Secondly, version-related properties can be The sample datasets used for the retrieved along with other properties, implementation were small by semantic web meaning that version-specific predicates standards. To provide a guide as to how our need not necessarily be defined in the query. model would scale we timed the Thirdly, because the inherent simplicity of implementation of the two triplestores and the data structure imposed by the RDF triple measures their sizes before and after the has not been disturbed, version-specific merges. attributes can be retrieved by matching simple graph patterns.

Page 10 dcterms:hasVersion ?x and ?x rdf:_1 ?z). Replacing the rdf:_1 value with a variable Querying the dataset in which versions would retrieve all versions of the resource. have distinct URIs

For example, the query below retrieves all Querying the dataset in which new the properties and their values for the versions retain the URI resource with the URI http://data.ordnancesurvey.co.uk/id/700000 Querying this data is slightly more 0000041441 : complicated due to the nested rdf:Collection element. The following query retrieves the SELECT ?x ?y first version (denoted by the rdf:_1 property) WHERE for a resource: { ?x ?y}; Prefix adgeo: isVersionOf, and isPartOf properties if SELECT ?p ?n WHERE present. These can in turn be queried using a { adgeo:7000000000010759 similarly simple query. For example, the ?node_variable_1 . following query retrieves the version of a ?node_variable_1 rdf:_1 resource and its parent version (prefixes are ?node_variable_2 . as previously defined): ?node_variable_2 ?p ?n . }

SELECT ?x ?y WHERE {?x dcterms:isVersionOf ?y}; The implementation demonstrates that the model is practical and offers the potential to Here the variable ?x retrieves the version, scale. The simplicity of the structure means and the variable ?y retrieves the parent that a simple pattern-based algorithm can be version. The exception is when querying the used to merge versions, and the simplicity hasVersion property, which must access the evident in the queries should impact version of the resource via the container, minimally on performance. It could be which is a blank node. This query is shown argued that the use of blank nodes in the in the following example: rdf:Alt and rdf:Collection containers introduces a degree of vagueness or SELECT ?y ?z indirection into the graph as blank nodes WHERE {?y dcterms:hasVersion ?x.?x have no URI and therefore cannot be rdf:_1 ?z}; directly linked to. It is also claimed by (Bizer, Cyganiak et al. 2008) that the use of This query retrieves the URIs of a versioned blank nodes hinders the merging of data resource, and its first version, represented by from different sources. (Bizer, Cyganiak et the variables ?y and ?z respectively. The al. 2008) also discourages the use of variable ?x represents the blank node of the containers (presumably because they utilize container, and the graph pattern matches two blank nodes), and suggests using multiple triples, delineated by a period (?y triples with the same predicate instead. In

Page 11 the case of versions where each version has has arisen in the RDF representation of its own URI, this would mean omitting the administrative geographic data, that is, the blank node from the model and having requirement of a linkage to a default version multiple hasVersion links between the of an administrative unit, and logical and default version (in our case the most recent) easy access to previous or alternative and its versions. In the case where versions versions of that unit. The proposed model retain the URI of the parent, the utilises standard RDF container and rdf:Collection element‟s blank node would collection elements in its structure, and need to be retained. We argue that our exploits industry standard Dublin Core model retains the semantics of the container metadata for its version-specific properties, that defines the versions as alternatives to negating the need to mint new properties the linked resource, and reduces complexity specific to versioning. The model represents in that the default version will have only one versions without interfering with the hasVersion property. Also, the use of multi- simplicity of the graph structure imposed by value elements enables the implementation the RDF triple, and this simplicity is evident of nested collections, providing the in the queries, which are able to utilise mechanism to version collectively the standard SPARQL syntax and retrieve properties of a resource. version-specific properties by matching simple graph patterns. The evaluation We maintain that the versioning establishes that the model is practical and mechanisms described offer the facility to relatively simple to implement, but that version resources for small to medium scaling might be problematical in larger datasets, and retain the inherent simplicity datasets. For this reason, we would only imposed by the RDF data model, by making recommend our method for versioning small certain compromises. These are: to medium size datasets with update frequencies of at least six months. The  Data redundancy, as in the model is also potentially applicable to two versioning of resources which retain issues related to versioning, that of variants the existing URI of the previous and ambiguity. Variants differ from versions version. in that rather than being an updated  An absence of information relating representation of a resource they are a to the nature of changes to a different representation of a current resource resource. that may or may not necessarily be  The use of blank nodes which may distinguished by time. For example a impede the data‟s suitability to be resource may have both administrative and merged with other datasets. financial representations. Ambiguity applies to resources that are physically distinct but 5. Conclusions are commonly identified by the same name. For example, there are two places in In this paper we have given a brief Hampshire called Hook. Someone searching background to RDF, and discussed some of for Hook on the internet would retrieve the possible methods which may be adopted URIs for both places with no distinction for the purposes of versioning RDF at the between the two. Both these scenarios will resource level in administrative geography be investigated in future work. data. We have described a versioning model based on RDF containers and collection elements that addresses a specific issue that References

Page 12 Berners-Lee, T. and D. Connolly. (2001). LDOW2010. "Delta: an ontology for the Soren Auer and H. Herre. (2005). "A distribution of differences between Versioning and Evolution RDF graphs." Retrieved 3/11/2009, Framework for RDF Knowledge from Bases." Retrieved 7th July, 2011, http://www.w3.org/DesignIssues/Dif from http://uni- f. leipzig.academia.edu/S%C3%B6ren Bizer, C., R. Cyganiak, et al. (2008). "How Auer/Papers/242787/A_Versioning_ to Publish Linked Data on the Web." and_Evolution_Framework_for_Rdf from http://www4.wiwiss.fu- _Knowledge_Bases. berlin.de/bizer/pub/LinkedDataTutor Talis. (2011). "Changesets." Retrieved ial/. April 25th, 2011, from Christoph RieB, Norman Heino, et al. http://n2.talis.com/wiki/Changesets. "EvoPat - Pattern-Based Evolution Tennison, J. (2010, 27-02-2010). and Refactoring of RDF Knowledge "Versioning (UK Government) Bases." Retrieved 7th July, 2011, Linked Data." Retrieved 07-07- from 2011, from http://svn.aksw.org/papers/2010/ISW http://www.jenitennison.com/blog/no C_Evolution/public.pdf. de/141. DCMI. (2010). "The Dublin Core Metadata Tummarello, G., C. Morbidoni, et al. (2007). Initiative." Retrieved 26th April, RDFSync: efficient remote 2010, from http://dublincore.org/. synchronization of RDF models. 6th deVos, A. (2002). "RDF Difference International and 2nd Asian Models." from Semantic Web Conference http://www.langdale.com.au/CIMX Volkel, M. (2005). "SemVersion – ML/DifferenceModelsR05.pdf. Versioning RDF and Ontologies." Franz. (2009). "AllegroGraph RDFStore." Retrieved 3/11/2009, from Retrieved 3/11/2009, from http://semversion.ontoware.org/kweb http://www.franz.com/agraph/allegro d233a.pdf. graph/. W3C. (2004). "RDF Vocabulary Description HMGovernment. (2011). "data.gov.uk." Language 1.0: RDF Schema." Retrieved April 18, 2011, from Retrieved 3/11/2009, from http://data.gov.uk/. http://www.w3.org/TR/rdf-schema/. ISO. (2008). "Information Technology - W3C. (2004). "RDF/XML Syntax Topic Maps - Part 2: Data Model " Specification (Revised)." Retrieved Retrieved 3/11/2009, from 3/11/2009, from http://www.isotopicmaps.org/sam/. http://www.w3.org/TR/rdf-syntax- Ludwig, C., M. W. Kuster, et al. (2008). grammar/. Versioning in Distributed Semantic W3C. (2008). "SPARQL Query Language Registries. iiWAS2008: 493-499. for RDF." Retrieved 3/11/2009, Sompel, H. V. d., R. Sanderson, et al. from http://www.w3.org/TR/rdf- (2010). An HTTP-Based Versioning -query/. Mechanism for Linked Data. W3C. (2011). "SPARQL 1.1 Query

Page 13 Language." Retrieved August 9th, 2011, from http://www.w3.org/TR/sparql11- query/.

About the authors Alex Lohfink is a lecturer at the University of Glamorgan, and gained a PhD there in December 2008. His current research interests are the semantic web and spatio- temporal databases. Duncan McPhee is a senior lecturer at the University of Glamorgan. His research interests are in computer-based learning, databases, data mining, and the semantic web.

Page 14