<<

Introduction to the semantic web and to the web of linked data

ESTP course on Introduction to Linked Open Data

Prof. Eero Hyvönen University of Helsinki and Aalto University Helsinki, Finland Sept 28, 2017

THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Eurostat Outline

• The Vision • Tehnological Basis: , ontologies, rules • How Does It Work in Practise? • What has been learned?

2 Eurostat 3 1. How to build an interoperable Web of Data?

4

Eurostat 2. How to build a more intelligent web?

1. approach • The contents stay as they are • The machines operate more human-like (Artificial Intelligence) 2. Contents represented in intelligent ways • The contents are easier to understand • Machines may stay more stupid In practice, both ways are needed • More intelligent systems process more intelligently represented contents

5 Eurostat Web generations

1G WWW: • WWW pages for human interpretation • HTML 2G WWW: • Structures for human/machine interpretation • XML language 3G WWW: Semantic Web • Meanings for human/machine use • RDF(S) language 4G WWW: Ubiquitous web for humans and machines

= foundation for intelligent web services • Semantic = “understandable” to machines

6 Eurostat Why Semantics?

NBA:H26069:467 cup and plate porcelain Germany Meissen

• This metadata cannot answer the following questions: • Find all vessels? • Find all ceramic products? • Find artifacts manufactured in Europe? • Does the city of Meissen manufacture ceramics? 7 Eurostat NBA-H26069-467 :object ”cup and plate” ; :object_concept object:cup ; :object_concept object:plate ;

:material ”porcelain” ; :material_concept object:porcelain ; place ontology :creationPlace ”Germany” ; :creationPlace_concept place:Germany ; place:Europe ... loc:partOf object ontology :creator ”Meissen” place:Germany ... :creator_concept actor:Meissen . creationLocation_concept object:vessel place:Meissen NBA-H26069-467 object_conceptrdfs:subClassOf object:cup rdfs:subClassOf Find all vessels? object_concept Find all ceramic products? object:plate Find artifacts manufactured in material_concept actor ontology Europe? material ontology Does the city of Meissen ... manufacture ceramics? material:porcelain actor:Meissen

8 Eurostat Case Rijksmuseum Amsterdam: CHIP Demonstrator

Example in notation • VRA metadata schema (extension of ) • (Aroyo et al., 2007)

A resource in the TGN ontology / vocabulary

Eurostat Amsterdam in TGN

Eurostat An Ontology Hierarchy

11 Eurostat Technological Basis of Semantic Web

12 The classical ”layer cake model”

Trust

Reasoning/ logic

Vocabularies/ ontologies Metadata

(Tim Berners-Lee)

Eurostat METADATA LEVEL

14 Why isn’t XML alone sufficient for the basis of Semantic web? • The semantics of XML is only in human brain

Peter Programmer 123 456
     

• We need a markup language, whose interpretation is: - Commonly agreed - Cross-domain - Machine-”understandable” - Data can be aggregated (documents combined)

15 Eurostat The Semantic web solution: RDF Resource Description Framework

• General metadata description language for web resources • Relational model, not a syntax (as opposed to XML) RDF description = directed graph • Semantics is defined based on logic • Syntax/serialization XML-based RDF/XML, especially for machines Simple triple notations (N3, Turtle, N-triples) for humans • Standardized and commonly used W3C draft 1999 W3C recommendation RDF 1.0, 10.2.2004 W3C recommendation RDF 1.1, 25.2.2014 16 Eurostat Metadata schemas

• Standardized formats for metadata descriptions • A set of elements/properties, and • Predefined value sets for them • Different content types typically require different properties • E.g., book vs. song vs. museum item • Problems • How the element values are represented? Tarja Halonen vs. Halonen T. 11.9.2001 vs. Sept 11, 2001 vs. 2001/09/11 • What do the values mean? ”glass”, ”nokia”, ”Pyhäjärvi” • How can different schema structures be combined? writer vs. creator as a property

17 Eurostat Example: Dublin Core

• Set of 15 general properties for different contents • Dublin Core Metadata Element Set (ISO Standard 15836) Title Creator Subject Description Publisher Contributor Data Type Format Identifier Relation Source Language Coverage Rights

18 Eurostat Dublin Core (2)

• DCMI Metadata Terms defines tens of qualifiers/refinements, which specialize the semantics of Dublin Core elements • E.g., accessRights < Rights • Dumb-down principle • A qualified version can always be replaced with a more generic one I.e., a qualifier can only specify the meaning of an element • The element values may have predefined encoding formats • Vocabulary encoding scheme Set of possible terms, e.g., listing of different resource types (text, image, …) • Syntax encoding scheme E.g., date ”2001-09-11” 19 Eurostat Dublin Core (3)

• Application Profiles • Defines a combination of DC elements and qualifiers + value encoding formats for a given domain (e.g., for library data) • Possible own extensions • E.g., Visual Resources Association (VRA) Core 4.0 New elements, such as ”measurements”

20 Eurostat Metadata Schema in HealthFinland

21 Eurostat HealthFinland portal: Maija’s eyeglasses – PDF document on the web

22 Eurostat Maija’s eyeglasses: metadata in RDF form

23 Eurostat ONTOLOGY LEVEL

24 What is an ontology?

• “An ontology is an explicit specification of a conceptualization • ...definitions need to be couched in some common formalism” • (Gruber, 1993)

Explicit: machine can understand Common (shared): communication is possible (not mentioned by Gruber) Formal: precisely defined • Defines the /objects and their relations in a given domain • A first requirement for the humans and machines to understand each other 25 Eurostat class-def animal class-def plant EXAMPLE OF AN subclass-of NOT animal class-def tree ONTOLOGY subclass-of plant class-def branch slot-constraint is-part-of has-value tree class-def leaf slot-constraint is-part-of has-value branch class-def defined carnivore subclass-of animal slot-constraint eats value-type animal class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant) class-def herbivore subclass-of NOT carnivore class-def giraffe subclass-of animal slot-constraint eats value-type leaf class-def lion subclass-of animal slot-constraint eats value-type herbivore class-def tasty-plant subclass-of plant slot-constraint eaten-by has-value herbivore, carnivore 26 Eurostat Ontology types

Numbers Axiomatized theory - - logic-based Machine understandable Taxonomy - relations - inheritance Thesaurus - constrains - relations - NT, BT, RT etc. Glossary - word list - little structure Philosophical

Human understandable text

Ontological complexity/depth 27

Eurostat W3C standards for Semantic web ontologies/vocabularies

• SKOS Simple Knowledge Organization System • Light-weight semantics • E.g., for representing existing glossaries, classification schemes, thesauri • OWL • Rich semantics based on logic • Supports more reasoning

28 Eurostat IEEE Standard Upper Merged Ontology (SUMO) • Goals • support in knowledge-based applications • Interoperability Define new data elements using SUMO and obtain mutual interoperability Interoperability between applications using domain specific ontologies (that use SUMO) Neutral interchange format for different systems • Application areas • E-commerce • E-learning • Natural language understanding tasks • … 29 Eurostat SUMO

30 Eurostat SUMO principal distinctions

31 Eurostat SUMO Object:

32 Eurostat AAT Art & Architecture Thesaurus - maintained by J. Paul Getty Trust - 7 main classes, 125 000 concepts

33 Eurostat Union List of Artist Names ULAN

• 120,000 instances • 293,000 names

34 Eurostat Resolving Identities

Eurostat Geonames • Classes: 9 feature classes, 645 feature codes • Instances: • 8 million geographical names, 6.5 million unique features, 2.2 million populated places, 1.8 million alternate names • Registries and Wiki used for populating the ontology

36 Eurostat Finnish Ontologies: ONKI

37 Eurostat ONKI.fi -> Finto.fi

38 Eurostat RULE LEVEL

39 The idea of rules

• “New” information can be derived from old by reasoning

• Semantic Web is based on logic!

40 Eurostat SUMO knowledge representation

• Developed in KIF (Knowledge Interchange Format) • A version of first order predicate logic • Other versions exist (e.g., OWL) • Size • 1006 terms • 4142 axioms • 814 rules

41 Eurostat Standardized XML notation for rules Rule Markup Language RuleML

42 Eurostat Application example: MuseumFinland recommends links • rules tell machine about the world • E.g., that ”student’s cap” is related to ”parties” • E.g., that entities are related to each other if their superclasses are related to each other • Etc. • The machine can: • Reason interesting new relations between museum items, and • Provide them to end users as recommendation links

43 Eurostat Application example: Recommendations in MuseumFinland

44 Eurostat (META)DATA + ONTOLOGIES = LINKED DATA

45 How Does It Work in Practise? Finnish Biography center and libraries collect historical data of people person name occupation birth place … P1 Akseli Gallen-Kallela artist Lemu P2 Gustaf Mannerheim marshal Askainen …

”Akseli Gallen-Kallela” name

person occupation artist P1 birth place type Lemu

type

”Gustaf Mannerheim” name

occupation P2 marshal birth place Askainen 47 Eurostat Museum catalogues paintings

Work name creator time Topic … W1 Portrait of Akseli Gallen- 1929 Gustaf Mannerheim Kallela Mannerheim W2 Aino Triptych Akseli Gallen- 1891 Aino, Kalevala Kallela …

name ”Akseli Gallen-Kallela” ...

creator type

painting W1

time 1929

topic

name ”Gustaf Mannerheim”

48 Eurostat Land survey maintains place registries municipalit province y Askainen Varsinais-Suomi Helsinki Uusimaa Lemu Varsinais-Suomi municipality Turku Varsinais-Suomi …

Lemu type

province type

type part-of ... part-of

Varsinais-Suomi Finland part-of

Askainen part-of Turku 49 Eurostat National library builds ontologies

KOKO ontology

concept

subclassOf endrant subclassOf perdurant abstract subclassOf

physical object place subclassOf subclassOf

subclassOf occupation time municipality

person artist

province

painting

marshal 50 Eurostat Semantic RDF graph combines them all:

Web of Data concept

subclassOf endurant subclassOf perdurant abstract subclassOf

physical object subclassOf place subclassOf subclassOf occupation time municipality

”Akseli Gallen-Kallela” name type person occupation artist P1 subclassOf birth place type Lemu type province creator type type painting type W1 ...

part-of part-of time 1929 topic Varsinais-Suomi Finland ”Gustaf Mannerheim” name part-of part-of occupation P2 marshal birth place Askainen Turku 51 Eurostat 1+1>2 AI In Principle a Piece of Cake but …

How to align concepts (URIs) used by different organizations?

How to align metadata models used by different organization?

SHARED INFRA NEEDED! Linked Data – Web of Data • Utilization of distributed work • Aggregating massive cross-domain contents • Linked Open Data thinking • Semantic portals

http://linkeddata.org

55 Eurostat Application domains of Semantic web

• Recommender systems • • E-business and web services • Profiling and customization • … • Virtually any field dealing with data! • https://www.w3.org/2001/sw/sweo/public/UseCases/

56 Eurostat Application Example: WarSampo – Linked Death Finnish WW2 on the Semantic Web

(Hyvönen et al., ESWC 2016) https://vimeo.com/212249404 Conclusions WHAT HAS BEEN LEARNED?

59 What is the Semantic web?

Content perspective: A new (meta)data layer on the web • Web as a global system • Web of Pages vs. Web of Data Application perspective: Machine-understandable web • Semantics for machines Rules • Enables human usage Intelligent web services Semantic interoperability Technological perspective: Ontology Next layers above XML • W3C standards: RDF(S), OWL , SPARQL, etc. Metadata

60 Eurostat Semantic Web Makes a Difference!

• End-user’s perspective • Global view to heterogeneous, distributed contents • Automatic content aggregation • • Semantic browsing and recommendations • Other intelligent services (knowledge discovery, personalization, visualization, …) • Content publisher’s perspective • Distributed content creation • Enriching each other’s contents semantically • Automated link maintenance • Shared content publication channel • Reusing aggregated content in other applications

Eurostat But the Lunch is not Free

• - More collaboration is need -> complicates work • - Integration of semantic portals with legacy systems • - Manual annotations are costly and may not scale up • - Automatic annotation lowers data quality

Eurostat Key Lesson Learned: create high quality semantic data when recording data

”Intellectuals solve problems - geniuses prevent them”

Albert Einstein ‹#› Questions

64 Eurostat