Everything You Always Wanted to Know About Blank Nodes ∗ Aidan Hogan A, Marcelo Arenas B, Alejandro Mallea B, Axel Polleres C

Total Page:16

File Type:pdf, Size:1020Kb

Everything You Always Wanted to Know About Blank Nodes ∗ Aidan Hogan A, Marcelo Arenas B, Alejandro Mallea B, Axel Polleres C Everything You Always Wanted to Know About Blank Nodes ∗ Aidan Hogan a, Marcelo Arenas b, Alejandro Mallea b, Axel Polleres c, aDepartment of Computer Science, Universidad de Chile bDepartment of Computer Science, Pontificia Universidad Católica de Chile cVienna University of Economics and Business (WU), Welthandelsplatz 1, 1020 Vienna, Austria Abstract In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as ‘existential variables’. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC–2012 dataset), where we find that 25.7% of unique RDF terms are blank nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many “blank node cycles”, the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently “rich” in ground information for problematic cases to be avoided by non-naive algorithms. Key words: blank nodes, rdf, simple entailment, leanness, skolemisation, semantic web, linked data Introduction rent definition of blank nodes appropriate for the needs of the Web community? Although the adoption of RDF [46] has broad- The standard semantics for blank nodes in- ened on the Web in recent years [11], one of its core terprets them as existential variables [36], de- features—blank nodes—has been sometimes misun- noting the existence of some unnamed resource. derstood, sometimes misinterpreted, and sometimes These semantics make even “simple” entailment ignored by implementers, other standards, and the checking—entailment without further well-defined broader Semantic Web community. This lack of con- vocabularies—intractable [36]. RDF and RDFS en- sistency between the standard and its actual use tailment are based on simple entailment, and are calls for investigation: are the semantics and the cur- thus also intractable due to blank nodes [36]. However, in the documentation for the RDF stan- dard (e.g., RDF/XML [9], RDF Primer [49]), the ? But Were Afraid to Ask Email addresses: [email protected] (Aidan Hogan), existentiality of blank nodes is not directly treated; [email protected] (Marcelo Arenas), ambiguous phrasing such as “blank node identifiers” [email protected] (Alejandro Mallea), is used, and examples for blank nodes focus on rep- [email protected] (Axel Polleres). Preprint submitted to Elsevier 15 December 2016 resenting resources that do not have a natural URI. remark on trends of adoption. Furthermore, the standards built upon RDF some- §6 We look at the role of blank nodes in pub- times have different treatment and requirements for lishing, their prevalence of use in real-world blank nodes. As we will see, standards and tools are Linked Data, and what blank node morpholo- often, to varying degrees, ambivalent to the existen- gies exist in the wild. tial semantics of blank nodes, where, e.g., the stan- §7 We give a detailed analysis of the prevalence dard query language SPARQL can return different of blank nodes that are made redundant by results for two graphs considered equivalent by the simple entailment, designing methods to effi- RDF semantics [4] and takes seemingly contradic- ciently identify non-leanness in RDF graphs tory positions on whether or not (named) graphs and discussing the results for a large sample can share blank nodes. of real-world Linked Data. Being part of the RDF specification, blank nodes §8 Finally, in light of the needs of the various are now a core aspect of Semantic Web technol- stakeholders already introduced, we discuss ogy: they are featured in several W3C standards, a some alternatives for handling blank nodes. wide range of tools, and hundreds of datasets across This paper extends a previously published confer- the Web, but not always with the same meaning ence paper [48]. Herein, we provide extended discus- (or at least, with the same intent). Dealing with sion throughout, we update our empirical analysis for the issue of blank nodes is thus not only impor- a more recent dataset, and we provide detailed tech- tant and timely, but also inherently complex and niques and results relating to classifying blank nodes potentially costly: before weighing up alternatives as lean or non-lean. for blank-nodes, their interpretation and adoption— across legacy standards, tool, and published data— must be considered. Running Example. Throughout this paper, we Given the complexity of the situation currently use the RDF graph given in Figure 1 to illustrate surrounding blank nodes—which spans several stan- our discussion. This graph states that the tennis dards, several perspectives, a plethora of tools and player :Federer won the :FrenchOpen in 2009; it hundreds of RDF publishers—our goal in this paper also states that he won :Wimbledon where one such is to bring clarity to this thorny issue by covering win was in 2003. 1 all aspects in one article. We provide a comprehen- sive overview of the current situation surrounding Preliminaries blank nodes, from theoretical background to stan- dardisation. We also provide novel techniques and results for analysing the blank nodes published in We begin by introducing the abstract representa- real-world Linked Data. Our outline is as follows: tion [32, 54] of the formal RDF model [36, 46] used in our theoretical discussion. We also introduce the §2 We provide some formal preliminaries relating semantics of RDF graphs containing blank nodes. to RDF and blank nodes. §3 We discuss blank nodes from a theoretical per- spective, relating them to background con- The RDF data model cepts from logic and database theory and, in so doing, we recapitulate some core worst- We assume the existence of pairwise disjoint infi- case complexity results. We further discuss nite sets U (URIs), L (literals) and B (blank nodes), Skolemisation in the context of RDF. where we write UB for the union of U and B, and §4 We look at how tasks such as simple entail- similarly for other combinations. 2 An RDF triple is ment and leanness checking can be performed in practice, where we discuss why worst-case 1 Throughout, we will omit the prefix declarations in complexity results are rarely encountered and SPARQL queries for brevity. The default prefix is used for remark on how SPARQL (the standard RDF example URIs. Other standard prefixes can be checked at query language) can be used for such tasks. the service http://prefix.cc. All URLs mentioned in this pa- per were retrieved at the time of writing: 2013/11/10. §5 We then survey how blank-nodes are treated 2 We generally stick to the term “URI” instead of “IRI” in the Semantic Web standards, how they are (supported by RDF 1.1) to follow conventions familiar from interpreted, what features rely on them, and the literature. For the purposes of this paper, URIs and IRIs can be considered interchangeable. 2 :TennisPlayer _:b1 year event "2003" type wins precededBy :Federer wins _:b2 event :FrenchOpen type :GrandSlamTournament type :Wimbledon year name wins precededBy "2009" "Roger Federer" _:b3 event Fig. 1. An RDF graph for our running example. In this graph, URIs are preceded by ‘:’, blank nodes by ‘_:’ and literals are enclosed in quotation marks. a tuple (s, p, o) ∈ UB × U × UBL where s is called instance µ(G) of G is proper if µ(G) has fewer blank the subject, p the predicate and o the object. nodes than G. This occurs when µ maps blank nodes to URIs or literals and/or maps two or more blank Definition 2.1 An RDF graph (or simply “graph”, nodes to the same blank node. where unambiguous) is a finite set of RDF triples. We now define two important operations on The set of terms of a graph G, denoted by graphs. The union of G1 and G2, denoted G1 ∪ G2, terms(G), is the set of elements of UBL that oc- is the set theoretical union of their sets of triples. A 4 cur in the triples of G.A vocabulary is a subset of merge of G1 and G2, denoted G1 +G2, is the union 0 0 0 0 UL. The vocabulary of G, denoted by voc(G), is de- G1 ∪ G2, where G1 and G2 are isomorphic copies of fined as terms(G) ∩ UL. Given a vocabulary V and G1 and G2, respectively, and where the sets of blank 0 0 a graph G, we say that G is a graph over V when- nodes in G1 and G2 are disjoint from each other. All ever voc(G) ⊆ V . A graph G is ground if it does not possible merges of G1 and G2 are pair-wise isomor- contain blank nodes (i.e., terms(G) ∩ B = ∅). phic such that G1 +G2 is unique up to isomorphism, A map is a partial function µ : UBL → UBL and we thus call it the merge of G1 and G2.
Recommended publications
  • Semantics Developer's Guide
    MarkLogic Server Semantic Graph Developer’s Guide 2 MarkLogic 10 May, 2019 Last Revised: 10.0-8, October, 2021 Copyright © 2021 MarkLogic Corporation. All rights reserved. MarkLogic Server MarkLogic 10—May, 2019 Semantic Graph Developer’s Guide—Page 2 MarkLogic Server Table of Contents Table of Contents Semantic Graph Developer’s Guide 1.0 Introduction to Semantic Graphs in MarkLogic ..........................................11 1.1 Terminology ..........................................................................................................12 1.2 Linked Open Data .................................................................................................13 1.3 RDF Implementation in MarkLogic .....................................................................14 1.3.1 Using RDF in MarkLogic .........................................................................15 1.3.1.1 Storing RDF Triples in MarkLogic ...........................................17 1.3.1.2 Querying Triples .......................................................................18 1.3.2 RDF Data Model .......................................................................................20 1.3.3 Blank Node Identifiers ..............................................................................21 1.3.4 RDF Datatypes ..........................................................................................21 1.3.5 IRIs and Prefixes .......................................................................................22 1.3.5.1 IRIs ............................................................................................22
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • An Introduction to RDF
    An Introduction to RDF Knowledge Technologies 1 Manolis Koubarakis Acknowledgement • This presentation is based on the excellent RDF primer by the W3C available at http://www.w3.org/TR/rdf-primer/ and http://www.w3.org/2007/02/turtle/primer/ . • Much of the material in this presentation is verbatim from the above Web site. Knowledge Technologies 2 Manolis Koubarakis Presentation Outline • Basic concepts of RDF • Serialization of RDF graphs: XML/RDF and Turtle • Other Features of RDF (Containers, Collections and Reification). Knowledge Technologies 3 Manolis Koubarakis What is RDF? •TheResource Description Framework (RDF) is a data model for representing information (especially metadata) about resources in the Web. • RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web (e.g., a book or a person). • RDF is intended for situations in which information about Web resources needs to be processed by applications, rather than being only displayed to people. Knowledge Technologies 4 Manolis Koubarakis Some History • RDF draws upon ideas from knowledge representation, artificial intelligence, and data management, including: – Semantic networks –Frames – Conceptual graphs – Logic-based knowledge representation – Relational databases • Shameless self-promotion : The closest to RDF, pre-Web knowledge representation language is Telos: John Mylopoulos, Alexander Borgida, Matthias Jarke, Manolis Koubarakis: Telos: Representing Knowledge About Information Systems. ACM Trans. Inf. Syst. 8(4): 325-362 (1990). Knowledge Technologies 5 Manolis Koubarakis The Semantic Web “Layer Cake” Knowledge Technologies 6 Manolis Koubarakis RDF Basics • RDF is based on the idea of identifying resources using Web identifiers and describing resources in terms of simple properties and property values.
    [Show full text]
  • On Blank Nodes
    On Blank Nodes Alejandro Mallea1, Marcelo Arenas1, Aidan Hogan2, and Axel Polleres2;3 1 Department of Computer Science, Pontificia Universidad Católica de Chile, Chile Email: {aemallea,marenas}@ing.puc.cl 2 Digital Enterprise Research Institute, National University of Ireland Galway, Ireland Email: {aidan.hogan,axel.polleres}@deri.org 3 Siemens AG Österreich, Siemensstrasse 90, 1210 Vienna, Austria Abstract. Blank nodes are defined in RDF as ‘existential variables’ in the same way that has been used before in mathematical logic. However, evidence suggests that actual usage of RDF does not follow this definition. In this paper we thor- oughly cover the issue of blank nodes, from incomplete information in database theory, over different treatments of blank nodes across the W3C stack of RDF- related standards, to empirical analysis of RDF data publicly available on the Web. We then summarize alternative approaches to the problem, weighing up advantages and disadvantages, also discussing proposals for Skolemization. 1 Introduction The Resource Description Framework (RDF) is a W3C standard for representing in- formation on the Web using a common data model [18]. Although adoption of RDF is growing (quite) fast [4, § 3], one of its core features—blank nodes—has been sometimes misunderstood, sometimes misinterpreted, and sometimes ignored by implementers, other standards, and the general Semantic Web community. This lack of consistency between the standard and its actual uses calls for attention. The standard semantics for blank nodes interprets them as existential variables, de- noting the existence of some unnamed resource. These semantics make even simple entailment checking intractable. RDF and RDFS entailment are based on simple entail- ment, and are also intractable due to blank nodes [14].
    [Show full text]
  • RDF—The Basis of the Semantic Web 3
    CHAPTER RDF—The basis of the Semantic Web 3 CHAPTER OUTLINE Distributing Data across the Web ........................................................................................................... 28 Merging Data from Multiple Sources ...................................................................................................... 32 Namespaces, URIs, and Identity............................................................................................................. 33 Expressing URIs in print..........................................................................................................35 Standard namespaces .............................................................................................................37 Identifiers in the RDF Namespace........................................................................................................... 38 Higher-order Relationships .................................................................................................................... 42 Alternatives for Serialization ................................................................................................................. 44 N-Triples................................................................................................................................44 Turtle.....................................................................................................................................45 RDF/XML..............................................................................................................................................
    [Show full text]
  • Inscribing the Blockchain with Digital Signatures of Signed RDF Graphs. a Worked Example of a Thought Experiment
    Inscribing the blockchain with digital signatures of signed RDF graphs. A worked example of a thought experiment. The thing is, ad hoc hash signatures on the blockchain carry no information linking the hash to the resource for which the hash is acting as a digital signature. The onus is on the inscriber to associate the hash with the resource via a separate communications channel, either by publishing the association directly or making use of one of the third-party inscription services that also offer a resource persistence facility – you get to describe what is signed by the hash and they store the description in a database, usually for a fee, or you can upload the resource for storage, definitely for a fee. Riccardo Cassata has a couple of technical articles that expose more of the practical details: https://blog.eternitywall.it/2016/02/16/how-to-verify-notarization/ So what's published is a hexadecimal string indistinguishable from all the others which purportedly matches the hash of Riccardo's mugshot. But which? https://tineye.com/search/ed9f8022c9af413a350ec5758cda520937feab21 What’s needed is a means of creating an inscription that is not only a signature but also a resovable reference to the resource for which it acts as a digital signature. It would be even better if it were possible to create a signature of structured information, such as that describing a social media post – especially if we could leverage off’ve the W3’s recent Activity Streams standard: https://www.w3.org/TR/activitystreams-core/ An example taken from the
    [Show full text]
  • Semantic Integration and Analysis of Clinical Data-1024
    Semantic integration and analysis of clinical data Hong Sun, Kristof Depraetere, Jos De Roo, Boris De Vloed, Giovanni Mels, Dirk Colaert Advanced Clinical Applications Research Group, Agfa HealthCare, Gent, Belgium {hong.sun, kristof.depraetere, jos.deroo, boris.devloed, giovanni.mels, dirk.colaert }@agfa.com Abstract. There is a growing need to semantically process and integrate clinical data from different sources for Clinical Data Management and Clinical Deci- sion Support in the healthcare IT industry. In the clinical practice domain, the semantic gap between clinical information systems and domain ontologies is quite often difficult to bridge in one step. In this paper, we report our experi- ence in using a two-step formalization approach to formalize clinical data, i.e. from database schemas to local formalisms and from local formalisms to do- main (unifying) formalisms. We use N3 rules to explicitly and formally state the mapping from local ontologies to domain ontologies. The resulting data ex- pressed in domain formalisms can be integrated and analyzed, though originat- ing from very distinct sources. Practices of applying the two-step approach in the infectious disorders and cancer domains are introduced. Keywords: Semantic interoperability, N3 rules, SPARQL, formal clinical data, data integration, data analysis, clinical information system. 1 Introduction A decade ago formal semantics, the study of logic within a linguistic framework, found a new means of expression, i.e. the World Wide Web and with it the Semantic Web [1]. The Semantic Web provides additional capabilities that enable information sharing between different resources which are semantically represented. It consists of a set of standards and technologies that include a simple data model for representing information (RDF) [2], a query language for RDF (SPARQL) [3], a schema language describing RDF vocabularies (RDFS) [4], a few syntaxes to represent RDF (RDF/XML [20], N3 [19]) and a language for describing and sharing ontologies (OWL) [5].
    [Show full text]
  • RDF and Property Graphs Interoperability: Status and Issues
    RDF and Property Graphs Interoperability: Status and Issues Renzo Angles1;2, Harsh Thakkar3, Dominik Tomaszuk4 1 Universidad de Talca, Chile 2 Millennium Institute for Foundational Research on Data, Chile 3 University of Bonn, Germany 4 University of Bialystok, Poland [email protected], [email protected], [email protected] Abstract. RDF and Property Graph databases are two approaches for data management that are based on modeling, storing and querying graph-like data. In this paper, we present a short study about the inter- operability between these approaches. We review the current solutions to the problem, identify their features, and discuss the inherent issues. 1 Introduction RDF [24] and graph databases [37] are two approaches for data management that are based on modeling, storing and querying graph-like data. Several database systems based on these models are gaining relevance in the industry due to their use in several domains where graphs and network analytics are required [6]. Both, RDF and graph database systems are tightly connected as they are based on graph-oriented database models. On the one hand, RDF database sys- tems (or triplestores) are based on the RDF data model [24], their standard query language is SPARQL [19], and there are languages to describe structure, restric- tions and semantics on RDF data (e.g. RDF Schema [13], OWL [18], SHACL [25], and ShEx [11]). On the other hand, most graph database systems are based on the Property Graph (PG) data model [7], there is no standard query language (although there are several proposals [4]), and the notions of graph schema and integrity constraints are limited [32].
    [Show full text]
  • The Resource Description Framework and Its Schema Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma
    The Resource Description Framework and its Schema Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma To cite this version: Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma. The Resource Description Frame- work and its Schema. Handbook of Semantic Web Technologies, 2011, 978-3-540-92912-3. hal- 01171045 HAL Id: hal-01171045 https://hal.inria.fr/hal-01171045 Submitted on 2 Jul 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Resource Description Framework and its Schema Fabien L. Gandon, INRIA Sophia Antipolis Reto Krummenacher, STI Innsbruck Sung-Kook Han, STI Innsbruck Ioan Toma, STI Innsbruck 1. Abstract RDF is a framework to publish statements on the web about anything. It allows anyone to describe resources, in particular Web resources, such as the author, creation date, subject, and copyright of an image. Any information portal or data-based web site can be interested in using the graph model of RDF to open its silos of data about persons, documents, events, products, services, places etc. RDF reuses the web approach to identify resources (URI) and to allow one to explicitly represent any relationship between two resources.
    [Show full text]
  • Rdflib Documentation
    rdflib Documentation Release 5.0.0-dev RDFLib Team Nov 01, 2018 Contents 1 Getting started 3 2 In depth 19 3 Reference 33 4 For developers 157 5 Indices and tables 171 Python Module Index 173 i ii rdflib Documentation, Release 5.0.0-dev RDFLib is a pure Python package work working with RDF. RDFLib contains most things you need to work with RDF, including: • parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, RDFa and Microdata. • a Graph interface which can be backed by any one of a number of Store implementations. • store implementations for in memory storage and persistent storage on top of the Berkeley DB. • a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and Update statements. Contents 1 rdflib Documentation, Release 5.0.0-dev 2 Contents CHAPTER 1 Getting started If you never used RDFLib, click through these 1.1 Getting started with RDFLib 1.1.1 Installation RDFLib is open source and is maintained in a GitHub repository. RDFLib releases, current and previous are listed on PyPi The best way to install RDFLib is to use pip (sudo as required): $ pip install rdflib Support is available through the rdflib-dev group: http://groups.google.com/group/rdflib-dev and on the IRC channel #rdflib on the freenode.net server The primary interface that RDFLib exposes for working with RDF is a Graph. The package uses various Python idioms that offer an appropriate way to introduce RDF to a Python programmer who hasn’t worked with RDF before. RDFLib graphs are not sorted containers; they have ordinary set operations (e.g.
    [Show full text]
  • RDF 1.1 Turtle
    25/3/2014 RDF 1.1 Turtle RDF 1.1 Turtle Terse RDF Triple Language W3C Recommendation 25 February 2014 This version: http://www.w3.org/TR/2014/REC-turtle-20140225/ Latest published version: http://www.w3.org/TR/turtle/ Test suite: http://www.w3.org/TR/2014/NOTE-rdf11-testcases-20140225/ Implementation report: http://www.w3.org/2013/TurtleReports/index.html Previous version: http://www.w3.org/TR/2014/PR-turtle-20140225/ Editors: Eric Prud'hommeaux, W3C Gavin Carothers, Lex Machina, Inc Authors: David Beckett Tim Berners-Lee, W3C Eric Prud'hommeaux, W3C Gavin Carothers, Lex Machina, Inc Please check the errata for any errors or issues reported since publication. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2008-2014 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web. This document defines a textual syntax for RDF called Turtle that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Turtle provides levels of compatibility with the N-Triples [N-TRIPLES] format as well as the triple pattern syntax of the SPARQL W3C Recommendation. Status of This Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
    [Show full text]
  • Delta: an Ontology for the Distribution of Differences Between RDF Graphs
    Delta: an ontology for the distribution of differences between RDF graphs Tim Berners-Lee and Dan Connolly? MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Abstract. The problem of updating and synchronizing data in the Se- mantic Web motivates an analog to text diffs for RDF graphs. This paper discusses the problem of comparing two RDF graphs, generating a set of differences, and updating a graph from a set of differences. It discusses two forms of difference information, the context-sensitive “weak” patch, and the context-free “strong” patch. It gives a proposed update ontology for patch files for RDF, and discusses experience with proof of concept code. 1 Introduction The use of text files to record programs, documents, and other artifacts is sup- ported by version control systems such as RCS[1] and CVS[2] that are based on the ability to compute the difference between two text files and represent it as diff[3], i.e. a set of editing instructions. The use of database tables to record bank accounts and records of all sorts is supported by the relational calculus[4] and its expression as SQL statements. In both cases, the data goes thru a se- quence of states; not only are the states represented explicitly (as text files or database tables) but also the transitions from one state to the other can be represented explicitly (either as editing instructions or SQL insert/update state- ments). Difference (∆) and sum (Σ) functions are ubiquitous in computing and, like differentiation and integration, are inverse in the sense that: v1 = Σ(v0, ∆(v0, v1)) (1) Since the transitions can be represented much more compactly than the pairs of states, and the sigma function is straightforward to compute, the deltas are useful for efficiently updating data distributed among two or more peers.
    [Show full text]