Linked Data, Dbpedia and D2R Server

Forschungsseminar: Neue Entwicklungen im Datenbankbereich und in der Bioinformatik HU Berlin, 26.06.2007 Linked Data, DBpedia and D2R Server Building blocks for the Emerging Web of Data Dr. Christian Bizer Freie Universität Berlin Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Outline 1. Approaches to Realize the Web of Data 2. Some Building Blocks 1. DBpedia 2. W3C Linking Open Data Project 3. D2RQ and D2R Server 3. Discussion Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 1 What does the Web offer us today? HTML DB Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) What do we actually want? Use the Web like a single, global database Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 2 Approaches to Realize the Web of Data 1. Google Base 2. Freebase 3. Semantic Web Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 1. Google Base Aims to be THE database for all data in the world Flexible item-based data model z Items have types and properties which can be defined by users Data is uploaded by many data providers z Lots of sales offers, event data, but also recipes Royalty-free access via z Graphical user interface z Application programming interface (API) Google interlinks the data with other datasets they crawl or own Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 3 Google Base User Interface Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 2. Freebase Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 4 3. Semantic Web Tim Berners-Lee (MIT/W3C) outlined the Semantic Web vision, twice. 2001, Scientific American article z Formal ontologies, hyper-intelligent agents, lots of AI 2007, Linked Data web architecture note z Go back to the basic Web architecture z Aim at a data web that is useful in the short term Basic Ideas of Linked Data z Publish pure data in addition to HTML pages on the Web z Set links between data items within different data sources Christian Bizer: Turning the Web into a Database. Freie Universität Berlin, 14.2.2007 Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF information 4. Include RDF statements that link to other URIs so that they can discover related things Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 5 Every Triple is a Hyperlink! rdf:type rc:cygri foaf:Person foaf:name Richard Cyganiak foaf:based_near dp:city/Berlin GET /city/Berlin HTTP/1.0 Accept: application/rdf+xml Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Dereferencing URIs over the Web rdf:type rc:cygri foaf:Person foaf:name Richard Cyganiak foaf:based_near dp:city/Berlin 3.405.259 dp:population dp:city/Berlin skos:subject dp:Cities_in_Germany Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 6 Dereferencing URIs over the Web rdf:type rc:cygri foaf:Person foaf:name 3.405.259 Richard Cyganiak dp:population foaf:based_near dp:city/Berlin skos:subject dp:Cities_in_Germany Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Dereferencing URIs over the Web rdf:type rc:cygri foaf:Person foaf:name 3.405.259 Richard Cyganiak dp:population foaf:based_near dp:city/Berlin skos:subject skos:subject dp:city/Hamburg dp:Cities_in_Germany dp:city/Muenchen skos:subject Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 7 Browsing the Semantic Web Tabulator Browser (MIT, USA) Disco Hyperdata Browser (FU Berlin) OpenLink RDF Browser (OpenLink, UK) Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) The Disco – Hyperdata Browser Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 8 Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Querying the Semantic Web RDF Link FOAF RDF Link SIOC DBpedia geonames RDF Link RDF Link Three Options: 1. Virtual Integration z Query Routing: Split SPARQL query and ask different data sources for the things that they might be able to answer. z HU Berlin: DARQ z Needs proper data source descriptions. z Complicated and slow. 2. Materialized Integration z Use Links between data items to crawl all data into a single repository. z Fast, but requires huge RDF repositories. z Worked for HTML, worked for RSS, so why not for RDF? 3. Materialization On-the-Fly z Crawl only data that is needed while answering the query. z FU Berlin: Semantic Web Client Library z University of London: SWIC z Works, but is really slow. Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 9 Semantic Web Search Engines Currently under development z Zitgist (Zitgist, USA) z SWSE (DERI, Ireland) z Watson (Open University, UK) z Swoogle (UMBC, USA) Crawl RDF data by following RDF Links Can offer sophisticated query capabilities Can offer nice user interfaces Hopefully get better this year! Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 2. Some Building Blocks 1. DBpedia – Extracting Structured Information from Wikipedia 2. W3C Linking Open Data Community Project 3. D2R Server – Publishing Relational Databases on the Web Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 10 2.1 DBpedia DBpedia.org is a community effort to z extract structured information from Wikipedia z make this information available on the Web as under an open license Contributors z Freie Universität Berlin (Germany) z Universität Leipzig (Germany) z OpenLink Software (UK) Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Extracting Structured Information from Wikipedia Wikipedia consists of z 6.9 million articles z in 251 languages z monthly growth-rate: 4% Wikipedia articles contain structured information z infoboxes which use a template mechanism z images depicting the article’s topic z categorization of the article z links to external webpages z intra-wiki links to other articles z inter-language links to articles about the same topic in different languages Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 11 Extracting Infobox Data http://en.wikipedia.org/wiki/Calgary <http://dbpedia.org/resource/Calgary> dbpedia:native_name “Calgary” ; dbpedia:altitude “1048” ; dbpedia:population_city “988193” ; dbpedia:population_metro “1079310” ; mayor_name dbpedia:Dave_Bronconnier ; governing_body dbpedia:Calgary_City_Council ; ... Altogether 9,100,000 RDF triples extracted from 754,000 infoboxes Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Multi-Lingual Abstracts The dataset contains a short and a long abstract for each concept. Short abstracts z English: 1,637,622 z German: 246,791 z French: 206,085 z Dutch: 133,746 z Polish: 118,874 z Italian: 113,950 z Spanish: 112,417 z Japanese: 106,610 z Portuguese: 104,842 z Swedish: 100,267 z Chinese: 54,991 Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 12 The DBpedia Dataset 1,600,000 concepts including z 58,000 persons z 70,000 places z 35,000 music albums z 12,000 films described by 93 million triples using 8,141 different properties. 557,000 links to pictures 1,300,000 links to relevant external web pages 207,000 Wikipedia categories 75,000 YAGO categories Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Accessing the DBpedia Dataset over the Web 1. SPARQL Endpoint 2. Linked Data Interface 3. RDF Dumps for Download Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 13 The DBpedia SPARQL Endpoint http://dbpedia.org/sparql can answer SPARQL queries like z Give me all Sitcoms from 1980 that are set in NYC? z All tennis players from Moscow? z All German musicians that were born in Berlin in the 19th century? Provides two extensions to SPARQL z free-text search within titles and abstracts z COUNT() hosted on a OpenLink Virtuoso server Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 14 DBpedia Use Cases 1. Improving Wikipedia Search 2. Royalty-Free Data Source for other Applications 3. Interlinking-Hub for the Emerging Web of Data Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Improving Wikipedia Search Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 15 2.2 W3C SWEO Linking Open Data Project Community effort to z publish various open-license databases as RDF on the Web z interlink data items between different data sources Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Project Participants Universities z Freie Universität Berlin (DE) z SIMILE, Massachusetts Institute of Technology (US) z KMi, Open University (UK) z DERI (IRE) z DB Group, University of Pennsylvania (US) z Universität Leipzig (DE) z L3S, Universität Hannover (DE) z University of London (UK) Companies z OpenLink (UK) z Talis (UK) z Zitgist (US) z MetaWeb (US) z Mondeca (FR) Outreach z Tim Berners-Lee (W3C, MIT) z Ivan Hermann (W3C) Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) 16 Datasets on the Web Over one billion RDF triples served on the Web Around 120,000 RDF links between data sources Christian Bizer: Linked Data, DBpedia and D2R Server (26.06.2007) Example RDF Links RDF Links from DBpedia to other Data Sources <http://dbpedia.org/resource/Berlin> owl:sameAs <http://sws.geonames.org/2950159> . <http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> .

Linked Data, Dbpedia and D2R Server

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Mapping Between Digital Identity Ontologies Through SISM

Linked Data Schemata: Fixing Unsound Foundations

OGC Testbed-14: Semantically Enabled Aviation Data Models Engineering Report

OWL 2 Web Ontology Language XML Serialization W3C Proposed Recommendation 22 September 2009

XHTML+Rdfa 1.1 - Third Edition Table of Contents

State of the Semantic Web

Standardization's All Very Well, but What About the Exabytes of Existing

RIF in RDF W3C Working Draft 22 June 2010

RDF and Property Graphs Interoperability: Status and Issues

The Resource Description Framework and Its Schema Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma

Experiments in Data Format Interoperation Using Defuddle