WebDB 2010 June 6th, 2010, Indianapolis, USA
The Web of Linked Data
A global public dataspace on the Web
Christian Bizer Freie Universität Berlin
Christian Bizer: The Web of Linked Data (6/6/2010) Outline
1. Foundations of Dataspaces and Linked Data Where do they overlap?
2. The Web of Linked Data What data is out there?
3. Linked Data Applications What i s b ei ng d one with th e da ta?
4. Remarks on Identity Self-descriptive Data Pay-as-you-go Integration
Christian Bizer: The Web of Linked Data (6/6/2010) The Dataspace Vision
Alternative to classic data integration systems in order to cope with growing number of data sources.
PtifdtProperties of dataspaces may contain any kind of data (structured, semi-structured, unstructured) require no upfront investment into a global schema provide for data-coexistence give best-effort answers to queries rely on pay-as-you-go data integration
Franklin, M ., Halevy , A ., and Maier , D .: From Databases to Dataspaces A new Abstraction for Information Management, SIGMOD Rec. 2005.
Christian Bizer: The Web of Linked Data (6/6/2010) Dataspace Architecture
Source: Franklin et al: From Databases to Dataspaces,Christian Bizer: The SIGMOD Web of Linked Rec. Data (6/6/2010)2005. Linked Data Principles
Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web.
1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
Christian Bizer: The Web of Linked Data (6/6/2010) Architecture of the classic Web
Single global information space Web Search Browsers Engines SlltfiltddSmall set of simple standards HTTP 1. HTML as document format 2. HTTP URL s as globally unique IDs HTML HTML HTML retrieval mechanism hyper- links 3. Hyperlinks to connect everything
A B C
Christian Bizer: The Web of Linked Data (6/6/2010) Web 2.0 APIs and Mashups
No single global dataspace Mashup
Short comi ngs 1. APIs have proprietary interfaces 2. Mashups are based on a Web Web Web Web fixed set of data sources API API API API 3. YtthlikYou can not set hyperlinks between data items within different APIs
A B C D
Christian Bizer: The Web of Linked Data (6/6/2010) Web APIs slice the Web into Walled Gardens
Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY Christian Bizer: The Web of Linked Data (6/6/2010) Linked Data
Extend the Web with a single global dataspace 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources
RDF RDF RDF RDF RDF
RDF RDF RDF RDF RDF
RDF RDF RDF RDF link links links links
A B C D E
Christian Bizer: The Web of Linked Data (6/6/2010) The RDF Data Model
rdf:type pd:cygri foaf:Person
fffoaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin
Flexible graph-based data model.
Christian Bizer: The Web of Linked Data (6/6/2010) Entities are identified with HTTP URIs
rdf:type pd:cygri foaf:Person
fffoaf:name Richard Cyganiak foaf:based_near dbpedia:Berlin
HTTP URIs take the role of global primary keys.
pdid:cygri = http:// ri ch ard .cygani ak .d e/f oaf .rdf# cygri dbpedia:Berlin = http://dbpedia.org/resource/Berlin
Christian Bizer: The Web of Linked Data (6/6/2010) Resolving URIs over the Web
rdf:type pd:cygri foaf:Person
fffoaf:name 3. 405. 259 Richard Cyganiak dp:population foaf:based_near dbpedia:Berlin
skos:subject
dp: Cities_ in_ Germany
The HTTP protocol brings together identification and retriev al again.
Christian Bizer: The Web of Linked Data (6/6/2010) Following Links deeper into the Web
rdf:type pd:cygri foaf:Person
fffoaf:name 3. 405. 259 Richard Cyganiak dp:population foaf:based_near dbpedia:Berlin
skos:subject skos:subject dbpe dia: Ham burg dp: Cities_ in_ Germany
dbpedia:Muenchen skos:subject
Christian Bizer: The Web of Linked Data (6/6/2010) The Disco – Hyperdata Browser
Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Properties of the Web of Linked Data
Global, distributed dataspace built on a simple set of standards RDF, URIs, HTTP Entities are connected by links creating a global data graph that spans data sources and enables the discovery of new data sources. Provides for data-coexistence Everyone can publish data to the Web of Linked Data Everyone can express their personal view on things Everybody can use the schemata that they like for this
Christian Bizer: The Web of Linked Data (6/6/2010) 2. Linked Data Deployment on the Web
Is this real?
RDF RDF RDF RDF RDF
RDF RDF RDF RDF RDF
RDF RDF RDF RDF link links links links
A B C D E
Christian Bizer: The Web of Linked Data (6/6/2010) W3C Linking Open Data Project
Grassroots community effort to publish existing open license datasets as Linked Data on the Web interlink things between different data sources
Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: May 2007
Over 500 million RDF triples Around 120,000 RDF links between data sources
Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: September 2008
Christian Bizer: The Web of Linked Data (6/6/2010) LOD Datasets on the Web: July 2009
Over 13.1 billion RDF triples
Over 142 million RDF links between data sources Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia – An Interlinking Hub in the Web of Data
Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia
community effort to extract structured information from Wikipedia. provides data about 3.4 million things 312, 000 persons 140,000 organizations 413,000 places 94,000 music albums 49,000 films 146,000 species … provides identifiers for many common things http://dbpedia.org/resource/Calgary overlaps with many other data sources on the Web
Christian Bizer: The Web of Linked Data (6/6/2010) The LOD effort is losing track with the diagram :-)
Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in Life Sciences
W3C Linking Open Drug Data Effort Bio2RDF Project Allen Brain Atlas
Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in the Libraries Community
Institutions publishing Linked Data Library of Congress (subject headings) German National Library (PND dataset and subject headings) Swedish National Library (Libris - catalog) Hungarian National Library (OPAC and Digital Library) German Central Library of Economics (subject headings)
Workshop: Semantic Web in Bibliotheken (SWIB09) Köln, 24. und 25. November 2009 http://www.swib09.de/
W3C Library Linked Data Incubator Group Oppjen Archives Object Reuse and Exchang g(e (OAI-ORE) Standard
Christian Bizer: The Web of Linked Data (6/6/2010) Uptake in the Media Industry
publish data as RDF/XML and/or embed data into HTML using RDFa
Christian Bizer: The Web of Linked Data (6/6/2010) The Structural Continuum
The Web of Linked Data is interwoven with the classic Web.
Unstructured data: HTML Semi-structured data: RDFa embed into HTML Structured data: RDF/XML
Services using named entity recognition to annotate texts with Linked Data URIs Open Calais (Thomsons Reuters) for news Zt(tt)fbltZemanta (startup) for blog posts
Christian Bizer: The Web of Linked Data (6/6/2010) 3. Linked Data Applications
What can I do with this?
Linked Data Linked Data Search Browsers Mashups Engines
Thing Thing Thing Thing Thing
Thing Thing Thing Thing Thing
typed typed typed typed links links links links
A B C D E
Christian Bizer: The Web of Linked Data (6/6/2010) Linked Data Browsers
PidfProvide for navi itibtgating between d dtata sources in order to explore the dataspace.
Tabulator Browser (MIT, USA) Marbles (FU Berlin, DE) Opp(p)enLink RDF Browser (OpenLink, UK) Zitgist RDF Browser (Zitgist, USA) Disco Hyper da ta Browser (FU B erli n, DE) Fenfire (DERI, Irland)
Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) DBpedia Mobile
Displays DBpedia data on a map Provides for navigating into other data sources
Christian Bizer: The Web of Linked Data (6/6/2010) Web of Data Search Engines
ClthdtCrawl the dataspace and provid idbte best-effor t query answers over crawled data.
Falcons (IWS, China) Sig.ma (DERI, Ireland) Swoogle (UMBC, USA) VisiNav (DERI, Ireland) Wat son (O pen U ni versit y, UK)
Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) Christian Bizer: The Web of Linked Data (6/6/2010) What are the big players doing?
Yahoo! and Google have started to crawl Linked Data in its RDFa serialization as well as Microformats . Yahoo! provides access to crawled data through the Yahoo BOSS API is using the data within Yahoo Search Monkey to make search results more useful an d v isua lly appea ling. Google uses crawled RDF data for its Social Graph API uses crawled data to enhance search results snippets fifor reviews an dld people.
Christian Bizer: The Web of Linked Data (6/6/2010) Yahoo! Search Monkey
Christian Bizer: The Web of Linked Data (6/6/2010) Facebook’s Open Graph Protocol
Facebook imports RDFa data from external web sites. For instance: IMDb, Microsoft, NHL, Posterous Rotten Tomatoes, TIME, Yelp
Christian Bizer: The Web of Linked Data (6/6/2010) 4. Remarks on
1. Identify 2. Self-descriptive Data 3. Pay-as-you-go Integration
Christian Bizer: The Web of Linked Data (6/6/2010) Identity
Real world objects are identified with multiple URIs.
Coupling of identification and retrieval. Data-coexistence: Everybody can say everything about everything.
Wrapper around the DBLP bibliography Linked Data website of our research group http://dblp.l3s.de/d2r/resource/authors/Christian_Bizer
http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/Person4
Christian Bizer: The Web of Linked Data (6/6/2010) Identity Resolution
Publication of owl:sameAs links on the Web.
Pay-as-you-go Identity Management Cheap to set up: Just put a wrapper in front of your DB (for instance using D2R Server) Later: You or somebody else invests effort into identity resolution related approach: iTrail hints (Vas Salles et al., VLDB 06, 07) How to create owl:sameAs links? Au tomati c b ase d on dec lara tive ma tc hing descr ip tions (for instance using the Silk Linking Framework ) Manually (for instance like within Ueberblick. org)
Christian Bizer: The Web of Linked Data (6/6/2010) Pay-As-You-Go Data Integration
1. Stage: RAW DATA NOW! don’ t care too much about the schema just publish your data as RDF on the Web
http://www.ted.com/talks/ tim_berners_lee_on_the_next_web.html
2. Stage: Increase the usefulness of your data and ease data integration by making it self-descriptive.
Christian Bizer: The Web of Linked Data (6/6/2010) Enable Clients to retrieve the Schema
Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions.
Some data on the Web
Resolve unknown term http://xmlns.com/foaf/0.1/Person
RDFS or OWL definition
Christian Bizer: The Web of Linked Data (6/6/2010) Reuse Terms from Common Vocabularies
Common Vocabularies Friend-of-a-Friend for describing people and their social network SIOC for describing forums and blogs SKOS for representing topic taxonomies Organization Ontology for describing the structure of organizations GoodRelations for describing products and business entities Music Ontology for describing artists, albums, and performances Review Vocabulary provides terms for representing reviews
Common sources of identifiers (URIs) for real world objects LinkedGeoData and Geonames: Locations GeneID and UniProt: Life science identifiers Dbpedia: Wide range of things
Christian Bizer: The Web of Linked Data (6/6/2010) Publish Schema Mappings on the Web
Schema Mapping
Simple Mappings: OWL owl:equivalentClass, owl:equivalentProperty Complex Mappings: R2R provides value transformation functions structural transformations
Pay-as-you-ggpo Aspect 1. Use a mix of common vocabularies and proprietary terms 2. You or somebody else publishes schema mappings afterwards
Christian Bizer: The Web of Linked Data (6/6/2010) Somebody-Pays-As-You-Go
The overall data integration effort is split between the data publisher, the Fix data consumer and third parties. Overall Data Integration Data Publisher Effort publishes data as RDF publishes data in a self-descriptive fashion sets links and publishes mappings Third Parties Third set links pointing at your data Publisher‘s Party Effort Effort publish mappings to the Web
Data Consumer Consumer‘s has to do the rest Effort
Christian Bizer: The Web of Linked Data (6/6/2010) Hands on: How to play around with Linked Data
Christian Bizer: The Web of Linked Data (6/6/2010) Hands on: How to play around with Linked Data
1. Get some data using a crawler for instance: LDspider (GPL license) http://code.google.com/p/ldspider/ 2. Store the data using for instance: Virtuoso (GPL), Sesame (BSD), Jena TDB (BSD) or any relational database or column store you like decision help: Berlin SPARQL Benchmark (Nov 2009) 3. Query and analyze the data using the SPARQL query language SPARQL 1.1 adds support for aggregates, subqueries, negation
Christian Bizer: The Web of Linked Data (6/6/2010) Shortcut: Billion Triples Challenge Dataset
Download the Billion Triples Challenge Dataset 3.2 billion triples (27GB gzipped) crawled from the public Web of Linked Data in March/April 2010 http://challenge.semanticweb.org/
If you do something interesting with the data submit your results to the challenge until October 1st present your results at the 9th International Semantic Web Conference (ISWC2010), November 2010, Shanghai, China
Christian Bizer: The Web of Linked Data (6/6/2010) Summary
Linked Data moves the dataspace vision to a global scale and adds the social/community aspect to it. The Web of Linked Data is gggpyrowing rapidly active deployment communities in different domains might have exceeded the critical mass Great playground for experimentation dataspace profiling probabilistic and approximate schema mapping data fusion, data quality, and trust What will the user interfaces look like? Will search engines turn into answer engines?
Christian Bizer: The Web of Linked Data (6/6/2010) Thanks!
References Overview Article Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Tutorial on How to Publish Linked Data on the Web http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ 3rd Linked Data on the Web Workshop at WWW2010 http://events.linkeddata.org/ldow2010/
Christian Bizer: The Web of Linked Data (6/6/2010)