Building a Knowledge Base of Linguistic Resources for Latin

LiLa: Linking Latin Building a Knowledge Base of Linguistic Resources for Latin The LiLa Team [email protected] First LiLa Workshop: Linguistic Resources & NLP Tools for Latin Milan | 3-4 June 2019 This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme - Grant Agreement No. 769994. Outlook 1 Why, What & How (M. Passarotti) LiLa Architecture (F. Mambrini) Resources-1: Derivational Morphology & Valency Lexicon (E. Litta) Resources-2: Latin WordNet (G. Franzini & A. Peverelli) NLP-1: Part-of-speech Tagging & Lemmatisation (F. M. Cecchini) NLP-2: Upcoming Resources in LiLa & a New Initiative (R. Sprugnoli) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Table of Contents 2 Why, What & How (M. Passarotti) LiLa Architecture (F. Mambrini) Resources-1: Derivational Morphology & Valency Lexicon (E. Litta) Resources-2: Latin WordNet (G. Franzini & A. Peverelli) NLP-1: Part-of-speech Tagging & Lemmatisation (F. M. Cecchini) NLP-2: Upcoming Resources in LiLa & a New Initiative (R. Sprugnoli) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Scattered and unconnected Research question State of affairs 3 We have built and collected (for Latin and other languages): I Textual Resources I Lexical Resources I NLP Tools The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Research question State of affairs 3 We have built and collected (for Latin and other languages): I Textual Resources I Lexical Resources I NLP Tools Scattered and unconnected The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE From Information to Knowledge I to extract maximum benefit from our research investments I to impact and improve the life of Classicists through exploitable computational resources and tools Research need Making sense 4 To make sense of this quantity of empirical data: The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE From Information to Knowledge I to impact and improve the life of Classicists through exploitable computational resources and tools Research need Making sense 4 To make sense of this quantity of empirical data: I to extract maximum benefit from our research investments The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE From Information to Knowledge Research need Making sense 4 To make sense of this quantity of empirical data: I to extract maximum benefit from our research investments I to impact and improve the life of Classicists through exploitable computational resources and tools The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Research need Making sense 4 To make sense of this quantity of empirical data: I to extract maximum benefit from our research investments I to impact and improve the life of Classicists through exploitable computational resources and tools From Information to Knowledge The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa Knowledge Base Approach: Linked Data paradigm 5 2018-2023 A collection of interoperable linguistics resources (and NLP tools) described with the same vocabulary for knowledge description Interlinking as a Form of Interaction The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE I Individuals: instances of objects (one specific token, lemma etc.) I Classes: types of objects/concepts (token, lemma, PoS etc.) I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI. LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE I Classes: types of objects/concepts (token, lemma, PoS etc.) I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI. LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: I Individuals: instances of objects (one specific token, lemma etc.) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI. LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: I Individuals: instances of objects (one specific token, lemma etc.) I Classes: types of objects/concepts (token, lemma, PoS etc.) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI. LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: I Individuals: instances of objects (one specific token, lemma etc.) I Classes: types of objects/concepts (token, lemma, PoS etc.) I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Each component of the ontology is uniquely identified through a URI. LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: I Individuals: instances of objects (one specific token, lemma etc.) I Classes: types of objects/concepts (token, lemma, PoS etc.) I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa Knowledge Base Conceptual and structural interoperability 6 LiLa is based on an ontology made of: I Individuals: instances of objects (one specific token, lemma etc.) I Classes: types of objects/concepts (token, lemma, PoS etc.) I Data properties: attributes that objects can/must have (morphological features for lemmas/tokens) I Object properties: ways in which classes and individuals can be related to one another: RDF triples. Labels from a restricted vocabulary of knowledge description: hasLemma, hasPoS Each component of the ontology is uniquely identified through a URI. The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa Knowledge Base Lexically-based architecture and (meta)data sources 7 NLP_Tools Form/Lemma Lexical_Ress Textual_Ress Token Morpho_Feats The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE Table of Contents 8 Why, What & How (M. Passarotti) LiLa Architecture (F. Mambrini) Resources-1: Derivational Morphology & Valency Lexicon (E. Litta) Resources-2: Latin WordNet (G. Franzini & A. Peverelli) NLP-1: Part-of-speech Tagging & Lemmatisation (F. M. Cecchini) NLP-2: Upcoming Resources in LiLa & a New Initiative (R. Sprugnoli) The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa is based on: I the Ontolex family, for lexical information I the OLiA bundle, for PoS tagging I NIF (and POWLA?) for corpus annotation General principles “Reuse standards, reuse standards, reuse standards”... 9 The golden rule: Reuse as many standards as you can. The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa is based on: I the Ontolex family, for lexical information I the OLiA bundle, for PoS tagging I NIF (and POWLA?) for corpus annotation General principles “Reuse standards, reuse standards, reuse standards”... 9 The golden rule: Reuse as many standards as you can. Extend, when you need to. The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LiLa is based on: I the Ontolex family, for lexical information I the OLiA bundle, for PoS tagging I NIF (and POWLA?) for corpus annotation General principles “Reuse standards, reuse standards, reuse standards”... 9 The golden rule: Reuse as many standards as you can. Extend, when you need to. Create from scratch, if you really must. The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE General principles “Reuse standards, reuse standards, reuse standards”... 9 The golden rule: Reuse as many standards as you can. Extend, when you need to. Create from scratch, if you really must. LiLa is based on: I the Ontolex family, for lexical information I the OLiA bundle, for PoS tagging I NIF (and POWLA?) for corpus annotation The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE “In the beginning was... the Lemma!" The lemma as gateway to linguistic resources 10 Lemmas Lexical Entries Tokens NLP Output Lexical Resources Textual Resources NLP Tools - Latin Wordnet - Digital libraries - Tokenizers - Valency Lexicon - Treebanks - Taggers/parsers - Dictionaries... - Textual corpora... - Lemmatizers... The LiLa Team | Università Cattolica del Sacro Cuore, CIRCSE LEMLAT http://www.lemlat3.eu/

Building a Knowledge Base of Linguistic Resources for Latin

Hittite Etymologies and Notes*

An Amerind Etymological Dictionary

The Art of Lexicography - Niladri Sekhar Dash

The Importance of Morphology, Etymology, and Phonology

LDL-2014 3Rd Workshop on Linked Data in Linguistics

The Oxford Dictionary of English Etymology Edited by C. T. ONIONS

Etymological Wordnet: Tracing the History of Words

Comparative Study on Currently Available Wordnets

This Etymological Dictionary Groups Related Words Into

Digital Transformation of the Etymological Dictionary of Geographical Names

Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing Etymdb 2.0 Clémentine Fourrier, Benoît Sagot

The Case of Two Greek Dictionaries Language Problems and Language Planning, 26(3): 219-252