<<

ISA2: 2016.16 - Public Multilingual Knowledge Management Infrastructure for the Digital (PMKI)

Semantic Interoperability for Multilingual DSIs 18/10/2018

Peter Schmitz, Enrico Francesconi Najeh Hajlaoui, Brahim Batouche

Publications Office of the EU Unit – Common Data Repository Section A2.003 – Repository

Summary

• PMKI - Public Multilingual Knowledge Management Infrastructure

• Context and Motivations

• Activities o Semantic Alignment and Lexicalization Strategies of Semantic Resources

• Conclusions Context

• Digital Single Market for (priority of Junker Commission) o Bringing down barriers, including language barriers o Unlock on-line, cross- opportunities

• Situation o EU cross-border on-line services represent only 4% of the global Digital Market o Only 7% of SMEs in the EU are selling cross-border

• Actions o PMKI for interoperability between multilingual knowledge organization systems (KOS) through - Semantic Web technologies - KOS semantic alignements PMKI in short

Type of activity Common services, common frameworks (Creation of a Public Multilingual Knowledge Infrastructure (PMKI)) in charge Publications Office of the European Union OP.A2 Associated Services  • DG CNECT.G3 • DIGIT.D2 • DGT.R3  • DG Traduction, Terminology Coordination unit First approval of the 2nd 2016 proposal in the scope of the general presentation of the ISA² by the ISA² programme Timeframe 2016 –2019 PMKI Objectives

• PMKI is an ISA2 pilot project aiming to: o Create a knowledge management infrastructure for multilingual thesauri, vocabularies, etc.

o Provide harmonization of their technical formats

o Align concepts to facilitate interoperability and extensions

o Set-up of a community and a structure allowing the integration of data sets

• PMKI platform represent a public "one-stop-shop" for interoperability assets at European level.

Architecture for the DSM PMKI in the common architecture for the multilingual DSM promoted by the EU Language Technology industry Possible use-cases PMKI Architecture

Users

ACCESS SPARQL endpoint, RESTful API

KNOWLEDGE BASE Concepts, relations

KNOWLEDGE MODEL Core data model (Ontolex-Lemon)

SERVICES Ingestion, alignment, data and knowledge model management, administration of the platform Solutions

• Knowledge model: Standard representation and Core data model have been adopted (Ontolex-Lemon) (done)

• Knowledge base: gold standard dataset (done)

• Services: VocBench –an open source collaborative platform for thesauri management– (ongoing adaptation).

• Users, Access: dissemination platform (planned).

Semantic Models for Multilingual Resources

• Ontolex-Lemon: W3C evolution of Lemon o Lemon focused on the ontology-lexicon interface o Ontolex-Lemon focused on defining a Lexicon model

• LIME (“LInguistic MEtadata”): Ontolex module describing the lexical characterization of the dataset o Metadata related to Linguistic Resources Support discovery and identification of LRs for a given task

o Metadata describing the linguistic expressivity of Datasets Information how to compare (, align) a Dataset with others Ontolex-Lemon Model Integration Framework and Activities

• Integration Framework o Semantic and Lexical Integration of Multilingual Resources

• Activities 1. Semantic resources alignment (ex: thesaurus mapping)

2. Lexical vs. semantic resources alignment (lexicalization of semantic resources)

Our Approach: Information Retrieval Framework Source Thesaurus Target Thesaurus (Ex: Eurovoc) (Ex: Eclas)

S i m i l a r i w1 t y w1 w2

w3 w3 w5 Aligned datasets

• Semantic Alignments o Eurovoc STW (Thesaurus for Economics of the German National Library of Economics) o Eurovoc ECLAS (European Commission Central Libraries thesaurus) o Eurovoc Teseo (Italian thesaurus)

• Lexical vs. Semantic Resources (lexicalization) o Eurovoc WordNet o Eurovoc IATE (European Parliament)

VocBench (Univ. of ) Communication: VocBench-PMKI Workshop

• Purpose: Presentation of the two strictly related ISA2 actions (PMKI, VocBench) • Target audience o Scientific community o Public Administrations o Language Technology Industries • Agenda o Vocbench Workshop (1 day) o PMKI and Lemon-Ontolex Vocbench extension discussion (1 day) • Planning o Preparation: Q4/2018 – Q1/2019 o Event: Q2/2019 (to be confirmed) Conclusions • PMKI is an ongoing project within the ISA2

• It aims to o show the benefit of Semantic Web technologies in providing services aiming to overcome language barriers

o be a "one-stop-shop" for multilingual language resources at European level

• Forthcoming developments

1. Automatic/Semi-Automatic mapping approach 2. Dissemination platform Thanks for your attention!

enrico.francesconi@publications..eu