Introduction to the semantic web and to the web of linked data
ESTP course on Introduction to Linked Open Data
Prof. Eero Hyvönen University of Helsinki and Aalto University Helsinki, Finland Sept 28, 2017
THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Eurostat Outline
• The Vision • Tehnological Basis: Metadata, ontologies, rules • How Does It Work in Practise? • What has been learned?
2 Eurostat 3 1. How to build an interoperable Web of Data?
4
Eurostat 2. How to build a more intelligent web?
1. Artificial Intelligence approach • The contents stay as they are • The machines operate more human-like (Artificial Intelligence) 2. Contents represented in intelligent ways • The contents are easier to understand • Machines may stay more stupid In practice, both ways are needed • More intelligent systems process more intelligently represented contents
5 Eurostat Web generations
1G WWW: • WWW pages for human interpretation • HTML language 2G WWW: • Structures for human/machine interpretation • XML language 3G WWW: Semantic Web • Meanings for human/machine use • RDF(S) language 4G WWW: Ubiquitous web for humans and machines
⇒ Semantics = foundation for intelligent web services • Semantic = “understandable” to machines
6 Eurostat Why Semantics?
• This metadata cannot answer the following questions: • Find all vessels? • Find all ceramic products? • Find artifacts manufactured in Europe? • Does the city of Meissen manufacture ceramics? 7 Eurostat NBA-H26069-467 :object ”cup and plate” ; :object_concept object:cup ; :object_concept object:plate ;
:material ”porcelain” ; :material_concept object:porcelain ; place ontology :creationPlace ”Germany” ; :creationPlace_concept place:Germany ; place:Europe ... loc:partOf object ontology :creator ”Meissen” place:Germany ... :creator_concept actor:Meissen . creationLocation_concept object:vessel place:Meissen NBA-H26069-467 object_conceptrdfs:subClassOf object:cup rdfs:subClassOf Find all vessels? object_concept Find all ceramic products? object:plate Find artifacts manufactured in material_concept actor ontology Europe? material ontology Does the city of Meissen ... manufacture ceramics? material:porcelain actor:Meissen
8 Eurostat Case Rijksmuseum Amsterdam: CHIP Demonstrator
Example in Turtle notation • VRA metadata schema (extension of Dublin Core) • (Aroyo et al., 2007)
A resource in the TGN ontology / vocabulary
Eurostat Amsterdam in TGN
Eurostat An Ontology Concept Hierarchy
11 Eurostat Technological Basis of Semantic Web
12 The classical ”layer cake model”
Trust
Reasoning/ logic
Vocabularies/ ontologies Metadata
(Tim Berners-Lee)
Eurostat METADATA LEVEL
14 Why isn’t XML alone sufficient for the basis of Semantic web? • The semantics of XML is only in human brain
• We need a markup language, whose interpretation is: - Commonly agreed - Cross-domain - Machine-”understandable” - Data can be aggregated (documents combined)
15 Eurostat The Semantic web solution: RDF Resource Description Framework
• General metadata description language for web resources • Relational model, not a syntax (as opposed to XML) RDF description = directed graph • Semantics is defined based on logic • Syntax/serialization XML-based RDF/XML, especially for machines Simple triple notations (N3, Turtle, N-triples) for humans • Standardized and commonly used W3C draft 1999 W3C recommendation RDF 1.0, 10.2.2004 W3C recommendation RDF 1.1, 25.2.2014 16 Eurostat Metadata schemas
• Standardized formats for metadata descriptions • A set of elements/properties, and • Predefined value sets for them • Different content types typically require different properties • E.g., book vs. song vs. museum item • Problems • How the element values are represented? Tarja Halonen vs. Halonen T. 11.9.2001 vs. Sept 11, 2001 vs. 2001/09/11 • What do the values mean? ”glass”, ”nokia”, ”Pyhäjärvi” • How can different schema structures be combined? writer vs. creator as a property
17 Eurostat Example: Dublin Core
• Set of 15 general properties for different contents • Dublin Core Metadata Element Set (ISO Standard 15836) Title Creator Subject Description Publisher Contributor Data Type Format Identifier Relation Source Language Coverage Rights
18 Eurostat Dublin Core (2)
• DCMI Metadata Terms defines tens of qualifiers/refinements, which specialize the semantics of Dublin Core elements • E.g., accessRights < Rights • Dumb-down principle • A qualified version can always be replaced with a more generic one I.e., a qualifier can only specify the meaning of an element • The element values may have predefined encoding formats • Vocabulary encoding scheme Set of possible terms, e.g., listing of different resource types (text, image, …) • Syntax encoding scheme E.g., date ”2001-09-11” 19 Eurostat Dublin Core (3)
• Application Profiles • Defines a combination of DC elements and qualifiers + value encoding formats for a given domain (e.g., for library data) • Possible own extensions • E.g., Visual Resources Association (VRA) Core 4.0 New elements, such as ”measurements”
20 Eurostat Metadata Schema in HealthFinland
21 Eurostat HealthFinland portal: Maija’s eyeglasses – PDF document on the web
22 Eurostat Maija’s eyeglasses: metadata in RDF form
23 Eurostat ONTOLOGY LEVEL
24 What is an ontology?
• “An ontology is an explicit specification of a conceptualization • ...definitions need to be couched in some common formalism” • (Gruber, 1993)
Explicit: machine can understand Common (shared): communication is possible (not mentioned by Gruber) Formal: precisely defined • Defines the concepts/objects and their relations in a given domain • A first requirement for the humans and machines to understand each other 25 Eurostat class-def animal class-def plant EXAMPLE OF AN subclass-of NOT animal class-def tree ONTOLOGY subclass-of plant class-def branch slot-constraint is-part-of has-value tree class-def leaf slot-constraint is-part-of has-value branch class-def defined carnivore subclass-of animal slot-constraint eats value-type animal class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant) class-def herbivore subclass-of NOT carnivore class-def giraffe subclass-of animal slot-constraint eats value-type leaf class-def lion subclass-of animal slot-constraint eats value-type herbivore class-def tasty-plant subclass-of plant slot-constraint eaten-by has-value herbivore, carnivore 26 Eurostat Ontology types
Numbers Axiomatized theory - formal system - logic-based Machine understandable Taxonomy - relations - inheritance Thesaurus - constrains - relations - NT, BT, RT etc. Glossary - word list - little structure Philosophical
Human understandable text
Ontological complexity/depth 27
Eurostat W3C standards for Semantic web ontologies/vocabularies
• SKOS Simple Knowledge Organization System • Light-weight semantics • E.g., for representing existing glossaries, classification schemes, thesauri • OWL Web Ontology Language • Rich semantics based on logic • Supports more reasoning
28 Eurostat IEEE Standard Upper Merged Ontology (SUMO) • Goals • Automated reasoning support in knowledge-based applications • Interoperability Define new data elements using SUMO and obtain mutual interoperability Interoperability between applications using domain specific ontologies (that use SUMO) Neutral interchange format for different systems • Application areas • E-commerce • E-learning • Natural language understanding tasks • … 29 Eurostat SUMO
30 Eurostat SUMO principal distinctions
31 Eurostat SUMO Object:
32 Eurostat AAT Art & Architecture Thesaurus - maintained by J. Paul Getty Trust - 7 main classes, 125 000 concepts
33 Eurostat Union List of Artist Names ULAN
• 120,000 instances • 293,000 names
34 Eurostat Resolving Identities
Eurostat Geonames • Classes: 9 feature classes, 645 feature codes • Instances: • 8 million geographical names, 6.5 million unique features, 2.2 million populated places, 1.8 million alternate names • Registries and Wiki used for populating the ontology
36 Eurostat Finnish Ontologies: ONKI
37 Eurostat ONKI.fi -> Finto.fi
38 Eurostat RULE LEVEL
39 The idea of rules
• “New” information can be derived from old by reasoning
• Semantic Web is based on logic!
40 Eurostat SUMO knowledge representation
• Developed in KIF (Knowledge Interchange Format) • A version of first order predicate logic • Other versions exist (e.g., OWL) • Size • 1006 terms • 4142 axioms • 814 rules
41 Eurostat Standardized XML notation for rules Rule Markup Language RuleML
42 Eurostat Application example: MuseumFinland recommends links • Inference rules tell machine about the world • E.g., that ”student’s cap” is related to ”parties” • E.g., that entities are related to each other if their superclasses are related to each other • Etc. • The machine can: • Reason interesting new relations between museum items, and • Provide them to end users as recommendation links
43 Eurostat Application example: Recommendations in MuseumFinland
44 Eurostat (META)DATA + ONTOLOGIES = LINKED DATA
45 How Does It Work in Practise? Finnish Biography center and libraries collect historical data of people person name occupation birth place … P1 Akseli Gallen-Kallela artist Lemu P2 Gustaf Mannerheim marshal Askainen …
”Akseli Gallen-Kallela” name
person occupation artist P1 birth place type Lemu
type
”Gustaf Mannerheim” name
occupation P2 marshal birth place Askainen 47 Eurostat Museum catalogues paintings
Work name creator time Topic … W1 Portrait of Akseli Gallen- 1929 Gustaf Mannerheim Kallela Mannerheim W2 Aino Triptych Akseli Gallen- 1891 Aino, Kalevala Kallela …
name ”Akseli Gallen-Kallela” ...
creator type
painting W1
time 1929
topic
name ”Gustaf Mannerheim”
48 Eurostat Land survey maintains place registries municipalit province y Askainen Varsinais-Suomi Helsinki Uusimaa Lemu Varsinais-Suomi municipality Turku Varsinais-Suomi …
Lemu type
province type
type part-of ... part-of
Varsinais-Suomi Finland part-of
Askainen part-of Turku 49 Eurostat National library builds ontologies
KOKO ontology
concept
subclassOf endrant subclassOf perdurant abstract subclassOf
physical object place subclassOf subclassOf
subclassOf occupation time municipality
person artist
province
painting
marshal 50 Eurostat Semantic RDF graph combines them all:
Web of Data concept
subclassOf endurant subclassOf perdurant abstract subclassOf
physical object subclassOf place subclassOf subclassOf occupation time municipality
”Akseli Gallen-Kallela” name type person occupation artist P1 subclassOf birth place type Lemu type province creator type type painting type W1 ...
part-of part-of time 1929 topic Varsinais-Suomi Finland ”Gustaf Mannerheim” name part-of part-of occupation P2 marshal birth place Askainen Turku 51 Eurostat 1+1>2 AI In Principle a Piece of Cake but …
How to align concepts (URIs) used by different organizations?
How to align metadata models used by different organization?
SHARED INFRA NEEDED! Linked Data – Web of Data • Utilization of distributed work • Aggregating massive cross-domain contents • Linked Open Data thinking • Semantic portals
http://linkeddata.org
55 Eurostat Application domains of Semantic web
• Information retrieval • Recommender systems • Knowledge management • E-business and web services • Profiling and customization • … • Virtually any field dealing with data! • https://www.w3.org/2001/sw/sweo/public/UseCases/
56 Eurostat Application Example: WarSampo – Linked Death Finnish WW2 on the Semantic Web
(Hyvönen et al., ESWC 2016) https://vimeo.com/212249404 Conclusions WHAT HAS BEEN LEARNED?
59 What is the Semantic web?
Content perspective: A new (meta)data layer on the web • Web as a global database system • Web of Pages vs. Web of Data Application perspective: Machine-understandable web • Semantics for machines Rules • Enables human usage Intelligent web services Semantic interoperability Technological perspective: Ontology Next layers above XML • W3C standards: RDF(S), OWL , SPARQL, etc. Metadata
60 Eurostat Semantic Web Makes a Difference!
• End-user’s perspective • Global view to heterogeneous, distributed contents • Automatic content aggregation • Semantic search • Semantic browsing and recommendations • Other intelligent services (knowledge discovery, personalization, visualization, …) • Content publisher’s perspective • Distributed content creation • Enriching each other’s contents semantically • Automated link maintenance • Shared content publication channel • Reusing aggregated content in other applications
Eurostat But the Lunch is not Free
• - More collaboration is need -> complicates work • - Integration of semantic portals with legacy systems • - Manual annotations are costly and may not scale up • - Automatic annotation lowers data quality
Eurostat Key Lesson Learned: create high quality semantic data when recording data
”Intellectuals solve problems - geniuses prevent them”
Albert Einstein ‹#› Questions
64 Eurostat