ISA2: 2016.16 - Public Multilingual Knowledge Management Infrastructure for the Digital Single Market (PMKI)
Total Page:16
File Type:pdf, Size:1020Kb
ISA2: 2016.16 - Public Multilingual Knowledge Management Infrastructure for the Digital Single Market (PMKI) Semantic Interoperability for Multilingual DSIs 18/10/2018 Peter Schmitz, Enrico Francesconi Najeh Hajlaoui, Brahim Batouche Publications Office of the EU Unit A2 – Common Data Repository Section A2.003 – Repository Summary • PMKI - Public Multilingual Knowledge Management Infrastructure • Context and Motivations • Activities o Semantic Alignment and Lexicalization Strategies of Semantic Resources • Conclusions Context • Digital Single Market for Europe (priority of Junker Commission) o Bringing down barriers, including language barriers o Unlock on-line, cross-border opportunities • Situation o EU cross-border on-line services represent only 4% of the global Digital Market o Only 7% of SMEs in the EU are selling cross-border • Actions o PMKI for interoperability between multilingual knowledge organization systems (KOS) through - Semantic Web technologies - KOS semantic alignements PMKI in short Type of activity Common services, common frameworks (Creation of a Public Multilingual Knowledge Infrastructure (PMKI)) Service in charge Publications Office of the European Union OP.A2 Associated Services European Commission • DG CNECT.G3 • DIGIT.D2 • DGT.R3 European Parliament • DG Traduction, Terminology Coordination unit First approval of the March 2nd 2016 proposal in the scope of the general presentation of the ISA² by the ISA² committee programme Timeframe 2016 –2019 PMKI Objectives • PMKI is an ISA2 pilot project aiming to: o Create a knowledge management infrastructure for multilingual thesauri, vocabularies, etc. o Provide harmonization of their technical formats o Align concepts to facilitate interoperability and extensions o Set-up of a community and a governance structure allowing the integration of data sets • PMKI platform may represent a public "one-stop-shop" for interoperability assets at European level. Architecture for the DSM PMKI in the common architecture for the multilingual DSM promoted by the EU Language Technology industry Possible use-cases PMKI Architecture Users ACCESS SPARQL endpoint, RESTful API KNOWLEDGE BASE Concepts, relations KNOWLEDGE MODEL Core data model (Ontolex-Lemon) SERVICES Ingestion, alignment, data and knowledge model management, administration of the platform Solutions • Knowledge model: Standard representation and Core data model have been adopted (Ontolex-Lemon) (done) • Knowledge base: gold standard dataset (done) • Services: VocBench –an open source collaborative platform for thesauri management– (ongoing adaptation). • Users, Access: dissemination platform (planned). Semantic Models for Multilingual Resources • Ontolex-Lemon: W3C evolution of Lemon o Lemon focused on the ontology-lexicon interface o Ontolex-Lemon focused on defining a Lexicon model • LIME (“LInguistic MEtadata”): Ontolex module describing the lexical characterization of the dataset o Metadata related to Linguistic Resources Support discovery and identification of LRs for a given task o Metadata describing the linguistic expressivity of Datasets Information how to compare (map, align) a Dataset with others Ontolex-Lemon Model Integration Framework and Activities • Integration Framework o Semantic and Lexical Integration of Multilingual Resources • Activities 1. Semantic resources alignment (ex: thesaurus mapping) 2. Lexical vs. semantic resources alignment (lexicalization of semantic resources) Our Approach: Information Retrieval Framework Source Thesaurus Target Thesaurus (Ex: Eurovoc) (Ex: Eclas) S i m i l a r i w1 t y w1 w2 w3 w3 w5 Aligned datasets • Semantic Alignments o Eurovoc STW (Thesaurus for Economics of the German National Library of Economics) o Eurovoc ECLAS (European Commission Central Libraries thesaurus) o Eurovoc Teseo (Italian Senate thesaurus) • Lexical vs. Semantic Resources (lexicalization) o Eurovoc WordNet o Eurovoc IATE (European Parliament) VocBench (Univ. of Rome) Communication: VocBench-PMKI Workshop • Purpose: Presentation of the two strictly related ISA2 actions (PMKI, VocBench) • Target audience o Scientific community o Public Administrations o Language Technology Industries • Agenda o Vocbench Workshop (1 day) o PMKI and Lemon-Ontolex Vocbench extension discussion (1 day) • Planning o Preparation: Q4/2018 – Q1/2019 o Event: Q2/2019 (to be confirmed) Conclusions • PMKI is an ongoing project within the ISA2 initiative • It aims to o show the benefit of Semantic Web technologies in providing services aiming to overcome language barriers o be a "one-stop-shop" for multilingual language resources at European level • Forthcoming developments 1. Automatic/Semi-Automatic mapping approach 2. Dissemination platform Thanks for your attention! [email protected] .