<<

The Semanc Web

[email protected] Linked Data vs Semanc Web

• While the Semanc Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal.

hp://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf What is Linked Data supposed to be?

• The basic idea of Linked Data is relavely simple. Tim Berners-Lee’s note on Linked Data describes four rules for publishing data on the Web:

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful informaon, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things.

hp://www.w3.org/DesignIssues/LinkedData.html What do we need Semanc Web for?

• Linked Data lies at the heart of what Semanc Web is all about: large scale integraon of, and reasoning on, data on the Web.

hp://www.w3.org/standards/semancweb/data#uses Problems with Linked Data

• Identy (Crisis?) • Publishing Data • Consuming Data

hp://milicicvuk.com/blog/2011/07/26/problems-of-linked-data-14-identy/ Identy: What is Linked Data really?

• There is currently considerable ambiguity as to the exact nature of Linked Data. The debate primarily centres around whether Linked Data must adhere to the four principles outlined in Tim Berners-Lee’s “Linked Data Design Issues”, and in parcular whether use of RDF and SPARQL is mandatory. Some argue that RDF is integral to Linked Data, others suggest that while it may be desirable, use of RDF is oponal rather than mandatory.

hp://wiki.ces.ac.uk/images/1/1a/The_Semanc_Web.pdf Identy: Where does the problem arise? • The problem arose because of the imprecise definion of the Linked Data rules that can be interpreted in different ways • Interpretaon of third rule: 3. When someone looks up a URI, provide useful informaon, using the standards (RDF*, SPARQL) • Concept of ‘linking data’ confused with mechanisms and specificaons to support it

Identy: They’re more guidelines than rules! • The four rules that are not actually rules, but expectaons of behaviour:

– Breaking them will not destroy anything, but misses and opportunity to make data interconnected.

• But, what about Quality?

hp://www.w3.org/DesignIssues/LinkedData.html Identy: 5 Star Rang System

hp://www.w3.org/DesignIssues/LinkedData.html Identy: Linked Data or ?

• Linked Data is actually linked only when data is rated with five stars • The name „Linked Data“ doesn’t make much sense for the lower rated data. • The „3 star“ data is thus interpreted as Open Data (one based an open licence and in non- proprietary formats) Identy: Let’s call it Open Data so!

• Problem remained is that between Open Data (3 stars) and Linked Data (5 stars), there is an requirement for using RDF, implying Linked Data must be based on RDF. Identy: Let’s soen that requirement...

hp://blog.soton.ac.uk/webteam/2011/07/17/linked-data-vs-open-data-vs-rdf-data/ Identy: Composability

• A data format that allows composability is one that:

1. has no schema 2. is self-describing 3. is “object centric”. In order to integrate informaon about different enes data must be related to these enes 4. is graph-based, because object-centric data sources, when composed, results in a graph, in the general case

• Stefan Decker claims that “any data format that fulfils the requirements (thus enabling the data Web) is “more or less” isomorphic to RDF”

hp://jodischneider.com/blog/2011/07/08/enabling-a-data-web-is-rdf-the-only-choice/ Identy: Alternaves to RDF?

• “It seems that RDF is paying the price for constantly isolang itself from the other related concepts over the years. For instance, is RDF really so different from the OO model, that objects are almost never menoned in the context of describing RDF? How many web developers would have understood RDF beer if it was explained in terms of the similarity/ difference to the OO model? How much beer and useful RDF would be if the Semanc Web and OO communies worked together?”

hp://milicicvuk.com/blog/2011/07/26/problems-of-linked-data-14-identy/ Publishing Data: Is sll hard

• Publishing Linked Data is oen perceived as unduly difficult, demovang people interested in publishing data. • An average potenal publisher has been „spoiled“ by much simpler soluons on the Web. She is used to geng quick explanaons, and learning from 5 minute tutorials. • People have no other opon than to learn how to publish Linked Data from 100 pages books and 3 hour lectures. It seems it’s not possible to explain Linked Data in less me and that’s what we should worry about.

hp://milicicvuk.com/blog/2011/08/02/problems-of-linked-data-34-publishing-data/ Publishing Data: Why link?

• Create links for the greater good?

OR

• Because linking enhances my data... Publishing Data: What types of links?

1. Relaonship Links point at related things in other data sources, for instance, other people, places or genes. For example, relaonship links enable people to point to background informaon about the place they live, or to bibliographic data about the publicaons they have wrien. 2. Identy Links point at URI aliases used by other data sources to idenfy the same real-world object or abstract concept. Identy links enable clients to retrieve further descripons about an enty from other data sources. Identy links have an important social funcon as they enable different views of the world to be expressed on the Web of Data. 3. Vocabulary Links point from data to the definions of the vocabulary terms that are used to represent the data, as well as from these definions to the definions of related terms in other vocabularies. Vocabulary links make data self-descripve and enable Linked Data applicaons to understand and integrate data across vocabularies.

hp://linkeddatabook.com/edions/1.0/#htoc18 Publishing Data: Should be easier...

• Linked Data is trying to follow the principles of the original Web, but instead of focusing on the most important one – simplicity. • It insists on the implementaon of various relavely complex and geeky technologies of the Web architecture.

hp://www.w3.org/DesignIssues/Principles.html Consuming Data: Should be easy, right? • Imagine a user (or a soware agent) wanng to find out what the capital of Germany is • Let’s assume that the user already knows the URI reference that represents the concept of – Germany (hp://.org/resource/Germany) – the URI of the property “has capital” (hp:// dbpedia.org/ontology/capital). Consuming Data: Example

• Would like to ask some thing like

?object

• Instead it’s a six step process... Consuming Data: Example

1. First, the user sends HTTP request to hp://dbpedia.org/resource/Germany. In the HTTP headers he specifies the RDF notaon (format) in which he wants to receive the descripon of the resource. 2. The server would answer: HTTP/1.1 303 See Other Locaon: hp://dbpedia.org/data/ Germany.rdf – This is a 303 redirect, which tells the client that a Web document containing a descripon of the requested (non-informaon) resource, in the requested format, can be found at the URI hp://dbpedia.org/data/ Germany.rdf 3. Next, the client will try to de-reference the new URI, looking up the hp://dbpedia.org/data/ Germany.rdf, given in the response from the server. 4. The server then responds with “200 OK” message, thus telling the client that the response contains the representaon of the informaon resource. – The “Content-Type:” header indicates the desired RDF/XML format, and the rest of the message contains the representaon describing the desired non-informaon resource, i.e. the triples encoded in the RDF/ XML notaon. This descripon can be of significant size – in this parcular case (hp://dbpedia.org/data/ Germany.rdf) it weights nearly half a megabyte (428KB). 5. When the download is complete, the descripon must be parsed which requires a special library. The usual procedure is that the triples are loaded into a local graph, while queries are performed, depending on the implementaon, via API methods or SPARQL. 6. Finally, the desired informaon is obtained — the URI reference of the capital of Germany is hp://dbpedia.org/resource/Berlin (34 bytes). If you need some addional informaon describing Berlin, you have to repeat the enre procedure with a new URI hp://dbpedia.org/ resource/Berlin. Consuming Data: Alternaves

• SPARQL is neater –

SELECT ?object WHERE { ?object . }

• But, it requires knowing another standard! Consuming Data: Efficient?

• Linked Data is about combining data from several endpoints.

– This is slow! Summary

• Publishing data by Linked Data rules for most people is very hard. Consuming data is hard. Understanding the underpinning theory is hard. Almost everything in Linked Data is hard. • And what do you get? – Traversing a graph and geng data is difficult and inefficient if done programacally and almost impossible in a browser.