ISA2: 2016.16 - Public Multilingual Knowledge Management Infrastructure for the Digital Single Market (PMKI)
Semantic Interoperability for Multilingual DSIs 18/10/2018
Peter Schmitz, Enrico Francesconi Najeh Hajlaoui, Brahim Batouche
Publications Office of the EU Unit A2 – Common Data Repository Section A2.003 – Repository
Summary
• PMKI - Public Multilingual Knowledge Management Infrastructure
• Context and Motivations
• Activities o Semantic Alignment and Lexicalization Strategies of Semantic Resources
• Conclusions Context
• Digital Single Market for Europe (priority of Junker Commission) o Bringing down barriers, including language barriers o Unlock on-line, cross-border opportunities
• Situation o EU cross-border on-line services represent only 4% of the global Digital Market o Only 7% of SMEs in the EU are selling cross-border
• Actions o PMKI for interoperability between multilingual knowledge organization systems (KOS) through - Semantic Web technologies - KOS semantic alignements PMKI in short
Type of activity Common services, common frameworks (Creation of a Public Multilingual Knowledge Infrastructure (PMKI)) Service in charge Publications Office of the European Union OP.A2 Associated Services European Commission • DG CNECT.G3 • DIGIT.D2 • DGT.R3 European Parliament • DG Traduction, Terminology Coordination unit First approval of the March 2nd 2016 proposal in the scope of the general presentation of the ISA² by the ISA² committee programme Timeframe 2016 –2019 PMKI Objectives
• PMKI is an ISA2 pilot project aiming to: o Create a knowledge management infrastructure for multilingual thesauri, vocabularies, etc.
o Provide harmonization of their technical formats
o Align concepts to facilitate interoperability and extensions
o Set-up of a community and a governance structure allowing the integration of data sets
• PMKI platform may represent a public "one-stop-shop" for interoperability assets at European level.
Architecture for the DSM PMKI in the common architecture for the multilingual DSM promoted by the EU Language Technology industry Possible use-cases PMKI Architecture
Users
ACCESS SPARQL endpoint, RESTful API
KNOWLEDGE BASE Concepts, relations
KNOWLEDGE MODEL Core data model (Ontolex-Lemon)
SERVICES Ingestion, alignment, data and knowledge model management, administration of the platform Solutions
• Knowledge model: Standard representation and Core data model have been adopted (Ontolex-Lemon) (done)
• Knowledge base: gold standard dataset (done)
• Services: VocBench –an open source collaborative platform for thesauri management– (ongoing adaptation).
• Users, Access: dissemination platform (planned).
Semantic Models for Multilingual Resources
• Ontolex-Lemon: W3C evolution of Lemon o Lemon focused on the ontology-lexicon interface o Ontolex-Lemon focused on defining a Lexicon model
• LIME (“LInguistic MEtadata”): Ontolex module describing the lexical characterization of the dataset o Metadata related to Linguistic Resources Support discovery and identification of LRs for a given task
o Metadata describing the linguistic expressivity of Datasets Information how to compare (map, align) a Dataset with others Ontolex-Lemon Model Integration Framework and Activities
• Integration Framework o Semantic and Lexical Integration of Multilingual Resources
• Activities 1. Semantic resources alignment (ex: thesaurus mapping)
2. Lexical vs. semantic resources alignment (lexicalization of semantic resources)
Our Approach: Information Retrieval Framework Source Thesaurus Target Thesaurus (Ex: Eurovoc) (Ex: Eclas)
S i m i l a r i w1 t y w1 w2
w3 w3 w5 Aligned datasets
• Semantic Alignments o Eurovoc STW (Thesaurus for Economics of the German National Library of Economics) o Eurovoc ECLAS (European Commission Central Libraries thesaurus) o Eurovoc Teseo (Italian Senate thesaurus)
• Lexical vs. Semantic Resources (lexicalization) o Eurovoc WordNet o Eurovoc IATE (European Parliament)
VocBench (Univ. of Rome) Communication: VocBench-PMKI Workshop
• Purpose: Presentation of the two strictly related ISA2 actions (PMKI, VocBench) • Target audience o Scientific community o Public Administrations o Language Technology Industries • Agenda o Vocbench Workshop (1 day) o PMKI and Lemon-Ontolex Vocbench extension discussion (1 day) • Planning o Preparation: Q4/2018 – Q1/2019 o Event: Q2/2019 (to be confirmed) Conclusions • PMKI is an ongoing project within the ISA2 initiative
• It aims to o show the benefit of Semantic Web technologies in providing services aiming to overcome language barriers
o be a "one-stop-shop" for multilingual language resources at European level
• Forthcoming developments
1. Automatic/Semi-Automatic mapping approach 2. Dissemination platform Thanks for your attention!
enrico.francesconi@publications.europa.eu