WebDB 2010 June 6th, 2010, Indianapolis, USA

The Web of

A global public dataspace on the Web

Christian Bizer Freie Universität Berlin

Christian Bizer: The Web of Linked Data (6/6/2010) Outline

1. Foundations of and Linked Data  Where do they overlap?

2. The Web of Linked Data  What data is out there?

3. Linked Data Applications  What i s b ei ng d one with th e da ta?

4. Remarks on  Identity  Self-descriptive Data  Pay-as-you-go Integration

Christian Bizer: The Web of Linked Data (6/6/2010) The Dataspace Vision

Alternative to classic systems in order to cope with growing number of data sources.

 PtifdtProperties of dataspaces  may contain any kind of data (structured, semi-structured, unstructured)  require no upfront investment into a global schema  provide for data-coexistence  give best-effort answers to queries  rely on pay-as-you-go data integration

Franklin, M ., Halevy , A ., and Maier , D .: From to Dataspaces A new Abstraction for Information Management, SIGMOD Rec. 2005.

Christian Bizer: The Web of Linked Data (6/6/2010) Dataspace Architecture

Source: Franklin et al: From Databases to Dataspaces,Christian Bizer: The SIGMOD Web of Linked Rec. Data (6/6/2010)2005. Linked Data Principles

Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.

1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things.

Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006

Christian Bizer: The Web of Linked Data (6/6/2010) Architecture of the classic Web

Single global information space Web Search Browsers Engines SlltfiltddSmall set of simple standards HTTP 1. HTML as document format 2. HTTP URL s as  globally unique IDs HTML HTML HTML  retrieval mechanism hyper- links 3. to connect everything

A B C

Christian Bizer: The Web of Linked Data (6/6/2010) Web 2.0 APIs and Mashups

No single global dataspace Mashup

Short comi ngs 1. APIs have proprietary interfaces 2. Mashups are based on a Web Web Web Web fixed set of data sources API API API API 3. YtthlikYou can not set hyperlinks between data items within different APIs

A B C D

Christian Bizer: The Web of Linked Data (6/6/2010) Web APIs slice the Web into Walled Gardens

Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY Christian Bizer: The Web of Linked Data (6/6/2010) Linked Data

Extend the Web with a single global dataspace 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources

RDF RDF RDF RDF RDF

RDF RDF RDF RDF RDF

RDF RDF RDF RDF link links links links

A B C D E

Christian Bizer: The Web of Linked Data (6/6/2010) The RDF Data Model

rdf:type pd:cygri :Person

fffoaf:name Richard Cyganiak foaf:based_near :Berlin

Flexible graph-based data model.

Christian Bizer: The Web of Linked Data (6/6/2010) Entities are identified with HTTP URIs

rdf:type pd:cygri foaf:Person

fffoaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin

HTTP URIs take the role of global primary keys.

pdid:cygri = http:// ri ch ard .cygani ak .d e/f oaf .rdf# cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin

Christian Bizer: The Web of Linked Data (6/6/2010) Resolving URIs over the Web

rdf:type pd:cygri foaf:Person

fffoaf:name 3. 405. 259 Richard Cyganiak dp:population foaf:based_near dbpedia:Berlin

skos:subject

dp: Cities_ in_ Germany

The HTTP protocol brings together identification and retriev al again.

Christian Bizer: The Web of Linked Data (6/6/2010) Following Links deeper into the Web

rdf:type pd:cygri foaf:Person

fffoaf:name 3. 405. 259 Richard Cyganiak dp:population foaf:based_near dbpedia:Berlin

skos:subject skos:subject dbpe dia: Ham burg dp: Cities_ in_ Germany

dbpedia:Muenchen skos:subject

Christian Bizer: The Web of Linked Data (6/6/2010) The Disco – Browser

Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Properties of the Web of Linked Data

 Global, distributed dataspace built on a simple set of standards  RDF, URIs, HTTP  Entities are connected by links  creating a global data graph that spans data sources and  enables the discovery of new data sources.  Provides for data-coexistence  Everyone can publish data to the Web of Linked Data  Everyone can express their personal view on things  Everybody can use the schemata that they like for this

Christian Bizer: The Web of Linked Data (6/6/2010) 2. Linked Data Deployment on the Web

Is this real?

RDF RDF RDF RDF RDF

RDF RDF RDF RDF RDF

RDF RDF RDF RDF link links links links

A B C D E

Christian Bizer: The Web of Linked Data (6/6/2010) W3C Linking Open Data Project

 Grassroots community effort to  publish existing open license datasets as Linked Data on the Web  interlink things between different data sources

Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: May 2007

 Over 500 million RDF triples  Around 120,000 RDF links between data sources

Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: September 2008

Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: July 2009

 Over 13.1 billion RDF triples

 Over 142 million RDF links between data sources Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia – An Interlinking Hub in the Web of Data

Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia

 community effort to extract structured information from Wikipedia.  provides data about 3.4 million things  312, 000 persons  140,000 organizations  413,000 places  94,000 music albums  49,000 films  146,000 species  …  provides identifiers for many common things  http://dbpedia.org/resource/Calgary  overlaps with many other data sources on the Web

Christian Bizer: The Web of Linked Data (6/6/2010) The LOD effort is losing track with the diagram :-)

Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in Life Sciences

 W3C Linking Open Drug Data Effort  Bio2RDF Project  Allen Brain Atlas

Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in the Libraries Community

 Institutions publishing Linked Data  Library of Congress (subject headings)  German National Library (PND dataset and subject headings)  Swedish National Library (Libris - catalog)  Hungarian National Library (OPAC and )  German Central Library of Economics (subject headings)

 Workshop: in Bibliotheken (SWIB09)  Köln, 24. und 25. November 2009  http://www.swib09.de/

 W3C Library Linked Data Incubator Group  Oppjen Archives Object Reuse and Exchang g(e (OAI-ORE) Standard

Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in the Media Industry

 publish data as RDF/XML and/or  embed data into HTML using RDFa

Christian Bizer: The Web of Linked Data (6/6/2010) The Structural Continuum

The Web of Linked Data is interwoven with the classic Web.

 Unstructured data: HTML  Semi-structured data: RDFa embed into HTML  Structured data: RDF/XML

 Services using named entity recognition to annotate texts with Linked Data URIs  Open Calais (Thomsons Reuters) for news  Zt(tt)fbltZemanta (startup) for blog posts

Christian Bizer: The Web of Linked Data (6/6/2010) 3. Linked Data Applications

What can I do with this?

Linked Data Linked Data Search Browsers Mashups Engines

Thing Thing Thing Thing Thing

Thing Thing Thing Thing Thing

typed typed typed typed links links links links

A B C D E

Christian Bizer: The Web of Linked Data (6/6/2010) Linked Data Browsers

PidfProvide for navi itibtgating between d dtata sources in order to explore the dataspace.

Tabulator Browser (MIT, USA) Marbles (FU Berlin, DE) Opp(p)enLink RDF Browser (OpenLink, UK) Zitgist RDF Browser (Zitgist, USA) Disco Hyper da ta Browser (FU B erli n, DE) Fenfire (DERI, Irland)

Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia Mobile

 Displays DBpedia data on a map  Provides for navigating into other data sources

Christian Bizer: The Web of Linked Data (6/6/2010) Web of Data Search Engines

ClthdtCrawl the dataspace and provid idbte best-effor t query answers over crawled data.

 Falcons (IWS, China)  Sig.ma (DERI, Ireland)  Swoogle (UMBC, USA)  VisiNav (DERI, Ireland)  Wat son (O pen U ni versit y, UK)

Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) What are the big players doing?

 Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as .  Yahoo!  provides access to crawled data through the Yahoo BOSS API  is using the data within Yahoo Search Monkey to make search results more useful an d v isua lly appea ling.  Google  uses crawled RDF data for its Social Graph API  uses crawled data to enhance search results snippets fifor reviews an dld people.

Christian Bizer: The Web of Linked Data (6/6/2010) Yahoo! Search Monkey

Christian Bizer: The Web of Linked Data (6/6/2010) Facebook’s Open Graph Protocol

 Facebook imports RDFa data from external web sites.  For instance:  IMDb, Microsoft, NHL, Posterous  Rotten Tomatoes, TIME, Yelp

Christian Bizer: The Web of Linked Data (6/6/2010) 4. Remarks on

1. Identify 2. Self-descriptive Data 3. Pay-as-you-go Integration

Christian Bizer: The Web of Linked Data (6/6/2010) Identity

Real world objects are identified with multiple URIs.

 Coupling of identification and retrieval.  Data-coexistence: Everybody can say everything about everything.

Wrapper around the DBLP bibliography Linked Data website of our research group http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer

http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4

Christian Bizer: The Web of Linked Data (6/6/2010) Identity Resolution

Publication of owl:sameAs links on the Web.

owl:sameAs .

 Pay-as-you-go Identity Management  Cheap to set up: Just put a wrapper in front of your DB (for instance using D2R Server)  Later: You or somebody else invests effort into identity resolution  related approach: iTrail hints (Vas Salles et al., VLDB 06, 07)  How to create owl:sameAs links?  Au tomati c b ase d on dec lara tive ma tc hing descr ip tions (for instance using the Silk Linking Framework )  Manually (for instance like within Ueberblick. org)

Christian Bizer: The Web of Linked Data (6/6/2010) Pay-As-You-Go Data Integration

1. Stage: RAW DATA NOW!  don’ t care too much about the schema  just publish your data as RDF on the Web

http://www.ted.com/talks/ tim_berners_lee_on_the_next_web.

2. Stage: Increase the usefulness of your data and ease data integration by making it self-descriptive.

Christian Bizer: The Web of Linked Data (6/6/2010) Enable Clients to retrieve the Schema

Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.

Some data on the Web foaf:name "Richard Cyganiak" ; rdf:type .

Resolve unknown term http://xmlns.com/foaf/0.1/Person

RDFS or OWL definition rdf:type owl:Class ; rdfs:label "Person"; rdfs:subClassOf ; rdfs:subClassOf .

Christian Bizer: The Web of Linked Data (6/6/2010) Reuse Terms from Common Vocabularies

 Common Vocabularies  Friend-of-a-Friend for describing people and their social network  SIOC for describing forums and blogs  SKOS for representing topic taxonomies  Organization Ontology for describing the structure of organizations  GoodRelations for describing products and business entities  Music Ontology for describing artists, albums, and performances  Review Vocabulary provides terms for representing reviews

 Common sources of identifiers (URIs) for real world objects  LinkedGeoData and Geonames: Locations  GeneID and UniProt: Life science identifiers  Dbpedia: Wide range of things

Christian Bizer: The Web of Linked Data (6/6/2010) Publish Schema Mappings on the Web

Schema Mapping owl:equivalentClass .

 Simple Mappings: OWL  owl:equivalentClass, owl:equivalentProperty  Complex Mappings: R2R  provides value transformation functions  structural transformations

 Pay-as-you-ggpo Aspect 1. Use a mix of common vocabularies and proprietary terms 2. You or somebody else publishes schema mappings afterwards

Christian Bizer: The Web of Linked Data (6/6/2010) Somebody-Pays-As-You-Go

The overall data integration effort is split between the data publisher, the Fix data consumer and third parties. Overall Data Integration  Data Publisher Effort  publishes data as RDF  publishes data in a self-descriptive fashion  sets links and publishes mappings  Third Parties Third  set links pointing at your data Publisher‘s Party Effort Effort  publish mappings to the Web

 Data Consumer Consumer‘s  has to do the rest Effort

Christian Bizer: The Web of Linked Data (6/6/2010) Hands on: How to play around with Linked Data

Christian Bizer: The Web of Linked Data (6/6/2010) Hands on: How to play around with Linked Data

1. Get some data using a crawler  for instance: LDspider (GPL license)  http://code.google.com/p/ldspider/ 2. Store the data  using for instance: Virtuoso (GPL), Sesame (BSD), Jena TDB (BSD)  or any relational or column store you like  decision help: Berlin SPARQL Benchmark (Nov 2009) 3. Query and analyze the data  using the SPARQL query language  SPARQL 1.1 adds support for aggregates, subqueries, negation

Christian Bizer: The Web of Linked Data (6/6/2010) Shortcut: Billion Triples Challenge Dataset

 Download the Billion Triples Challenge Dataset  3.2 billion triples (27GB gzipped)  crawled from the public Web of Linked Data in March/April 2010  http://challenge.semanticweb.org/

 If you do something interesting with the data  submit your results to the challenge until October 1st  present your results at the 9th International Semantic Web Conference (ISWC2010), November 2010, Shanghai, China

Christian Bizer: The Web of Linked Data (6/6/2010) Summary

 Linked Data moves the dataspace vision to a global scale and adds the social/community aspect to it.  The Web of Linked Data is gggpyrowing rapidly  active deployment communities in different domains  might have exceeded the critical mass  Great playground for experimentation  dataspace profiling  probabilistic and approximate schema mapping  data fusion, data quality, and trust  What will the user interfaces look like?  Will search engines turn into answer engines?

Christian Bizer: The Web of Linked Data (6/6/2010) Thanks!

References  Overview Article Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf  Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData  Tutorial on How to Publish Linked Data on the Web http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/  3rd Linked Data on the Web Workshop at WWW2010 http://events.linkeddata.org/ldow2010/

Christian Bizer: The Web of Linked Data (6/6/2010)