Turning the Web Into a Database

Turning the Web Into a Database

Disputationsvortrag 14.2.2007 Turning the Web into a Database Diplom Kaufmann Christian Bizer Christian Bizer: Turning the Web into a Database (14.2.2007) What does the Web offer us today? HTML DB Christian Bizer: Turning the Web into a Database (14.2.2007) 1 What do we actually want? Christian Bizer: Turning the Web into a Database (14.2.2007) Outline 1. Development of the Web 2. Overview of some Base Technologies 3. Integrating Relational Databases into the Web 4. Summary and Outlook Christian Bizer: Turning the Web into a Database (14.2.2007) 2 Development of the Web Semantic Web World Wide Web Web 2.0 Web of Data Web 3.0 1995 2005 ? Christian Bizer: Turning the Web into a Database (14.2.2007) Current Stage: Web 2.0 1. Read-and-Write Web z User-generated Content, Weblogs, Social Networks 2. Introduction of more structured content formats z Microformats: Small data islands within HTML pages (hCard, hCallender, hReview) z Really Simple Syndication (RSS): Adds explicit timestamp to Web content Increased structure enables more sophisticated information processing! Christian Bizer: Turning the Web into a Database (14.2.2007) 3 Really Simple Syndication (RSS) <?xml version="1.0"?> <rss version="2.0"> <channel> <title>CNN.com</title> <language>en-us</language> <item> <title>Bush: Iraq plan is 'best chance'</title> <link>http://edition.cnn.com/2007/POLITICS/... <description>President George W. Bush says ... <pubDate>Wed, 24 Jan 2007 07:34:44 EST</pubDate> </item> <item> <title>Rape charge for Israeli president</title> <link>http://edition.cnn.com/2007/POLITICS/... <description>Israel's attorney general has ... <pubDate>Wed, 24 Jan 2007 12:44:21 EST</pubDate> </item> ... Christian Bizer: Turning the Web into a Database (14.2.2007) Use Case: Trend Scouting Crawls 63.2 million RSS feeds Provides keyword search and trend analysis Christian Bizer: Turning the Web into a Database (14.2.2007) 4 Use Case: Financial Information Analysis Deutsche Telekom (ISIN: DE0005557508) Diploma Thesis Aggregates RSS feeds from: 296 analyst houses 700 newspapers 1 feed about insider trades Christian Bizer: Turning the Web into a Database (14.2.2007) Structure enables Information Processing Tim Berners-Lee (MIT/W3C) Basic Ideas z Publish pure data in addition to HTML pages on the Web z Set links between data items within different data sources Realization z Pushed by W3C with several standards z World wide research effort z Industry is getting involved (Oracle, HP, IBM) Base Technologies z Resource Description Framework (RDF) z Query Language for RDF (SPARQL) Christian Bizer: Turning the Web into a Database. Freie Universität Berlin, 14.2.2007 Christian Bizer: Turning the Web into a Database (14.2.2007) 5 Resource Description Framework (RDF) RDF provides a simple data model for publishing data on the Web. Information is represented in the form of triples. Triple: Chris knows Richard Subject:: http://www.bizer.de#chris Predicate: http://xmlns.com/foaf/0.1/knows Object: http://richard.cyganiak.de/foaf.rdf#cygri Triple: Chris was born 1973-09-20 Subject: http://www.bizer.de#chris Predicate: http://www.w3.org/2001/vcard-rdf/3.0#BDAY Object: "1973-09-20"^^http://www.w3.org/2001/XMLSchema#date Christian Bizer: Turning the Web into a Database (14.2.2007) Every Triple is a Hyperlink rdf:type rc:cygri foaf:Person foaf:name Richard Cyganiak foaf:based_near dp:city/Berlin GET /city/Berlin HTTP/1.0 Accept: application/rdf+xml Christian Bizer: Turning the Web into a Database (14.2.2007) 6 Every Triple is a Hyperlink rdf:type rc:cygri foaf:Person foaf:name Richard Cyganiak foaf:based_near dp:city/Berlin 3.398.888 dp:population dp:city/Berlin skos:subject dp:Cities_in_Germany Christian Bizer: Turning the Web into a Database (14.2.2007) Every Triple is a Hyperlink rdf:type rc:cygri foaf:Person foaf:name 3.398.888 Richard Cyganiak dp:population foaf:based_near dp:city/Berlin skos:subject skos:subject dp:city/Hamburg dp:Cities_in_Germany dp:city/Muenchen skos:subject geo:district geo:Freimann geo:district geo:Schwabing Christian Bizer: Turning the Web into a Database (14.2.2007) 7 The Disco – Hyperdata Browser Christian Bizer: Turning the Web into a Database (14.2.2007) Christian Bizer: Turning the Web into a Database (14.2.2007) 8 The SPARQL Query Language W3C standard query language for RDF data Example query Give me the name and birthday of all people that were born after January 1st, 1970. SELECT ?name ?birth WHERE { ?person foaf:name ?name . ?person vcard:BDAY ?birth . FILTER (?birth > "1970-01-01"^^xsd:date) } ?name ?birth "Chris" "1973-09-20"^^xsd:date "Andy" "1980-03-12"^^xsd:date ... ... Christian Bizer: Turning the Web into a Database (14.2.2007) What to query? 1. Querying a Single Data Source 2. Querying Multiple Data Sources 3. Querying the complete Semantic Web Christian Bizer: Turning the Web into a Database (14.2.2007) 9 Querying a Single Data Source Data source: dbpedia.org 19 million triples extracted from Wikipedia info boxes. Example Query Give me all persons from Wikipedia that were born in Berlin before 1900. SELECT ?name ?birth ?description WHERE {?person dbpedia:birthplace dbpedia:resource/city/Berlin . ?person dbpedia:birth ?birth . ?person foaf:name ?name . FILTER (?birth < "1900-01-01"^^xsd:date) } Christian Bizer: Turning the Web into a Database (14.2.2007) Christian Bizer: Turning the Web into a Database (14.2.2007) 10 Querying Multiple Data Sources sameAs dbpedia geonames sameAs Three Options: 1. Virtual Integration z Query Routing: Split query and ask data sources for the things that they can answer. z HU Berlin: DARQ z Complicated and slow. 2. Materialized Integration z Use Links between data items to crawl all data into a single repository. z Crawlers under development: Zitgist, OpenLink USA and SWSE, DERI Irland z Fast, but requires huge RDF repositories. 3. Materialization On-the-Fly z Crawl only data that is needed while answering the query. z FU Berlin: Semantic Web Client Library z Works, but is really slow. Christian Bizer: Turning the Web into a Database (14.2.2007) Querying the complete Semantic Web Two Options: 1. Materialized Integration z Use Links between data items to crawl all data into a single repository. z Worked for HTML (Google) and RSS (Technorati) 2. Materialization On-the-Fly z Crawl only data that is needed while answering the query. z Works, but is really slow. Christian Bizer: Turning the Web into a Database (14.2.2007) 11 Viewing the Web as RDF Now we have a data model and a query language. What is still needed is an RDF view on the complete Web. Web 1.0 Semantic Web Web 2.0 RDF GRDDL HTML Graph RDF Graph GRDDL RSS RDF Graph RDF ? Graph Relational RDF Database Graph Christian Bizer: Turning the Web into a Database (14.2.2007) Integrating Relational Databases into the Semantic Web What is needed? z A way to map data between the relational data model and RDF z Some software to make the resulting RDF data Web-accessible We developed two artifacts: z The D2RQ Mapping Language z D2R Server Christian Bizer: Turning the Web into a Database (14.2.2007) 12 Mapping Relational Data to RDF rdf:type People ex:Per12 foaf:Person ID name email foaf:name 12 Chris [email protected] Chris Papers foaf:mbox mailto:[email protected] ID title confID 312 D2R Server 132 dc:author ex:Paper312 Rel_People_Papers PersonID PaperID swc:conference dc:title 12 312 ex:Conf132 D2R Server Christian Bizer: Turning the Web into a Database (14.2.2007) The mapping has to bridge: Data Model Heterogeneity z relations versus a graph z primary keys versus URIs Schema Heterogeneity z name versus given name, surname Syntactic Heterogeneity z timestamp versus “2007-02-14”^^xsd:data Christian Bizer: Turning the Web into a Database (14.2.2007) 13 Value Correspondences People rdf:type ex:Pers12 foaf:Person ID name email 12 Chris [email protected] foaf:name Chris Papers foaf:mbox ID title confID mailto:[email protected] 312 D2R Server 132 dc:author ex:Paper312 Rel_People_Papers PersonID PaperID swc:conference 12 312 dc:title ex:Conf132 D2R Server D2RQ is a language for writing these correspondences down. Christian Bizer: Turning the Web into a Database (14.2.2007) Example D2RQ Mapping PropertyBridge foaf:name ClassMap People PropertyBridge foaf:mbox PropertyBridge dc:author rdf:type refersTo PropertyBridge dc:title ClassMap Papers PropertyBridge swc:conference rdf:type Christian Bizer: Turning the Web into a Database (14.2.2007) 14 Class Map People ID name email 12 Chris [email protected] :PeopleClassMap a d2rq:ClassMap; d2rq:class foaf:Person; d2rq:uriPattern "http://example.org/person@@People.ID@@". Christian Bizer: Turning the Web into a Database (14.2.2007) Property Bridge People ID name email 12 Chris [email protected] :PeopleEmailProperty a d2rq:PropertyBridge; d2rq:belongsToClassMap :PeopleClassMap; d2rq:property foaf:mbox; d2rq:uriPattern "mailto:@@People.email@@". Christian Bizer: Turning the Web into a Database (14.2.2007) 15 Joins People Papers ID name email ID title confID 12 Chris [email protected] 312 D2R Server 132 Rel_People_Papers PersonID PaperID 12 312 :PeoplePaperRelation a d2rq:PropertyBridge; d2rq:belongsToClassMap :PeopleClassMap; d2rq:property dc:author; d2rq:refersToClassMap :PapersClassMap; d2rq:join "People.ID=Rel_People_Papers.PersonID"; d2rq:join "Rel_People_Papers.PersonID=Papers.ID"; Christian Bizer: Turning the Web into a Database (14.2.2007) D2R Server open source project (GNU General Public License) around 1500 downloads (138 in January 2007) Christian Bizer: Turning the Web into a Database (14.2.2007) 16 Example Rewriting from SPARQL to SQL SELECT ?person ?mbox WHERE { ?person foaf:name "Chris" .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us