
Wiltrud Kessler

Institut f¨urMaschinelle Sprachverarbeitung Universit¨atStuttgart

Semantic Web Winter 2015/16

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

The Semantic Web Stack [W3C, Tim Berners-Lee]

User Interface, Software Agents

Trust Proof Logic, Rules Ontology, OWL SPARQL RDFS

RDF Encryption

XML, XMLSchema, Namespaces Digital Signatures

URI Unicode, UTF-8

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

What are Semantic Web Applications?

Some propositions for a definition:

I SW applications use standardized SW languages, e.g., RDF, OWL, SPARQL.

I SW applications have something to do with metadata, data exchange and integration.

I SW applications follow the vision of having humans and machines interact in a web of knowledge.

I ... ?

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Add Metadata

I Metadata is “data about data”:

I author, contact, source, . . . I creation date, location, . . . I format, type of data, language, . . . I legal information, . . . I description of content, keywords, . . .

I Metadata is only useful when it comes with semantics, i.e., is standardized across sources.

RSS – Markup for Blogs

I RSS originally stood for RDF Site Summary in RSS 0.9.

I RSS 0.91 was renamed to stand for Rich Site Summary and RDF elements were removed, this developed into XML-only RSS 2.0 (Really Simple Syndication 2.0)

I In 2003 an alternative format, , was created.

I Used by many blogging services, e.g., WordPress, Blogger, . . .

I RDF File (RSS 1.0):

I RSS 2.0:

I Atom:

Microformats – Low-level semantic markup

I Use attribute class to add semantic markup from a defined vocabulary to HTML/XHTML tags.

I Community-effort, no standardization gremium.

I Well-known : (geographical coordinates), hCard (contact information), hCalendar (events)

I Web page:

I hCard creator:

I hCalendar creator: http: //

Microformats – Example hCard

... ...

I Add attributes to HTML to embed RDF statements.

I Can embed arbitrary RDF, not limited to a vocabulary (like with microformats).

I W3C recommendation since 2008.

I RDFa Core recommendation:

I RDFa Primer:

RDFa – Example

In his latest book Wikinomics, Don Tapscott explains deep changes in technology, demographics and business. The book is due to be published in October 2006.

CC REL – Copyright Information

I Creative Commons offers free, easy-to-undestand licenses.

I Offer an RDF vocabulary to describe licenses – makes copyright information machine-readable.

I Useful for: publishers, image search, display in browser, . . .

I Web page:

I RDF File:

CC REL – Example

@prefix cc: .

xhtml:license ; dc:title "The Lessig Blog" ; cc:attributionName "Larry Lessig" ; cc:attributionURL ; dc:type dcmitype:Text .

cc:permits cc:Reproduction ; cc:permits cc:Distribution .

FOAF – Friend of a Friend

I RDF/OWL vocabulary about persons and social networks.

I Used by many blogging services: WordPress, TypePad, . . .

I Used by Semantically-Interlinked Online Communities Project (SIOC) to interconnect blogs, forums and mailing lists.

I Crucial part of WebID, a set of proposed standards for identification, and authentication on HTTP based networks.

I Web page:

I Search FOAF:

I Create a FOAF profile:

FOAF – Example

@PREFIX rdf: . @PREFIX rdfs: .

<#JW> rdf:type foaf:Person ; foaf:name "Jimmy Wales" ; foaf:mbox ; foaf:homepage ; foaf:nick "Jimbo" ; foaf:depiction ; foaf:interest ; foaf:knows [ rdf:type foaf:Person ; foaf:name "Angela Beesley" ].

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Data Integration

I Make sure we talk about the same thing by referencing formally specified, language independent knowledge. I “Knowledge” consists of two parts:

I General world knowledge – upper ontologies. I Domain-specific knowledge – domain vocabulary / ontology.

I Often a common vocabulary is sufficient (RDF), but sometimes more semantics is needed (RDFS/OWL).

I Data integration is a major challenge and cost factor in today’s data-driven world!

Upper Ontologies

Definition [Wikipedia] An upper ontology is an ontology which describes very general concepts that are the same across all knowledge domains.

I Goal: Support interoperability between specific ontologies – they all link back to an upper ontology.

I Problem: Nobody agrees on one such ontology, there are many different upper ontologies.

Cyc and OpenCyc

Cyc is an upper ontology, the project started in 1984. Parts of the project are released as OpenCyc under an open source licence, which contains 239,000 concepts and 2,093,000 facts and can be browsed on the OpenCyc website.

I Project website:

I OpenCyc:

UMBEL – Upper Mapping and Binding Exchange Layer

UMBEL is an upper ontology of about 28,000 reference concepts extracted from OpenCyc. UMBEL also is a vocabulary for aiding ontology mapping, including expressions of likelihood relationships distinct from exact identity or equivalence. UMBEL has about 48,000 formal mappings to DBpedia, PROTON, GeoNames, and, and provides linkages to more than 2 million Wikipedia pages (English version).

I Project website:

Domain-specific Ontologies

Definition [Wikipedia] A domain ontology (or domain-specific ontology) represents concepts which belong to part of the world. Particular meanings of terms applied to that domain are provided by domain ontology.

I Goal: Model exactly what is needed in the domain on a very detailed level, up-to-date representation of field.

I Problem: Different domain-specific ontologies are often incompatible, even if they model the same domain.

GeoNames – Geographical

The GeoNames database contains over 10,000,000 geographical names. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes.

I Developers: Marc Wick, Christophe Boutreux

I Project website:

Gene ontology

Gene ontology, is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. It aims to serve as a platform where curators can agree on stating how and why a specific term is used, and how to consistently apply it, for example, to establish relationships between gene products. It is part of a larger classification effort, the Open Biomedical Ontologies.

I Gene Ontology Consortium:

I Open Biological and Biomedical Ontologies:

Ontology of Units of Measure

The Ontology of units of Measure and related concepts (OM) models concepts and relations important to scientific research. It has a strong focus on units and quantities, measurements, and dimensions.

I OM ontology web page:

I Quantities, Units, Dimensions and Data Types Ontologies (QUDT):

I Unified Code for Units of Measure ontology (UCUM):

Semantic Document Management Systems

I Large companies have a wealth of knowledge contained in documents, but they are very heterogenous.

I The challenge is the integration of this data.

I Using ontologies as semantic data models can bring together disparate data sources into one body of information.

I Documents can still be written with domain-specific vocabulary or even in different language, but concepts can easily be accessed through the ontology.

Semantic DMS Example – BBC Web Sites

I Large amount of web pages in different areas (tv, radio, news, gardening, music, wildlife, . . . ) that are not connected.

I Tag pages with URIs that link entities and topics from a controlled vocabulary to DBpedia.

I Make the data available and searchable.

I Project description: http: //

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Creation of Semantic Data

I Goal: Offer big amounts of machine-readable data. I What do we need to get there?

I Help people to create semantically annotated data. I Extract semantically annotated data from existing sources. I Make data available, retrievable, findable.

I Support easy linking from one dataset to others.

I Big datasets are also a good starting point for data harmonization (see last section).

Semantic MediaWiki – A Semantic Extension for MediaWiki

Semantic MediaWiki (SMW) is a free, open-source extension to MediaWiki – the wiki software that powers Wikipedia – that lets you store and query data within the wiki’s pages. Semantic MediaWiki is also a full-fledged framework, in conjunction with many spinoff extensions, that can turn a wiki into a powerful and flexible “collaborative database”. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

I Developers: Karlsruhe Institute of Technology

I Project website:

GRDDL – Gleaning Resource Descriptions from Dialects of Languages

GRDDL (pronounced ’griddle’) is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. It became a Recommendation on September 11, 2007.

I Recommendation:

I GRDDL Primer:

I GRDDL Use Cases:

DBPedia – Structured Wikipedia

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. We hope this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself.

I Developers: University of Leipzig, University of Mannheim, OpenLink Software

I Project website:

YAGO – Yet Another Great Ontology

YAGO is a huge semantic , derived from Wikipedia, WordNet and GeoNames. Currently, YAGO has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.

I Developers: Max-Planck-Institute Saarbr¨ucken

I Project website: departments/databases-and-information-systems/ research/yago-naga/yago/

Other Big Ontologies

ProBase by Microsoft Research, “above 2.7 million concepts” (no date given), used in Bing. Freebase by Metaweb Technologies/Google, “44 million topics and 2.4 billion facts” (Jan ’14), used by Google Knowledge Graph until 2014. Evi by TrueKnowledge/Evi/Amazon, “283 million facts about 9 million things” (Aug ’10), used by Siri.

Big Data Sources that are not Ontologies

Wolfram Alpha by Wolfram Research, structured factual data from different sources, used by Bing, DuckDuckGo, Siri, and others for factual questions. Wikidata by Wikimedia Foundation, structured data of different categories, used by Wikipedia for infobox data and interlanguage links, used by Google (since 2015). ... Many many more openly available datasets about weather, population, polls, etc. published by diverse statistics institutions around the world.

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Linked (Open) Data

The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.

I :

I Linked Open Data: CommunityProjects/LinkingOpenData

I LOD cloud:

Linking Open Data cloud diagram 2014

Sindice – A Engine

Sindice collects Web Data in many ways, following existing web standards, and offers Search and Querying across this data, updated live every few minutes. Specialized APIs , and tools are also available.

I Developers: DERI institute, Fondazione Bruno Kessler, OpenLink software

I Project website:

GoPubMed – Searching in Biomedical Texts

GoPubMed is a knowledge-based search engine for biomedical texts. The Gene Ontology and Medical Subject Headings serve as “Table of contents” in order to structure the millions of articles of the MEDLINE database. The knowledge behind GoPubMed consists of in total 48 million concepts.

I Developers: Dresden University of Technology, Transinsight

I Project website:

Entity-centric Semantic Search: Broccoli


Entity-centric Semantic Search: Evi

Semantic Web Applications

Add Metadata

Data Integration

Create Semantic Data

Finding Semantic Information

Suggested Reading

Suggested Reading

[HKR09] Pascal Hitzler, Markus Kr¨otzschand Sebastian Rudolph. Foundations of Semantic Web Technologies. Chapman & Hall/CRC, 2009. (Chapter 9)

Wiltrud Kessler Semantic Web Applications 44 / 44