Towards Ontolex-Lemon Editing in Vocbench 3
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Towards a Sense-Based Access to Related Online Lexical Resources
Towards a Sense-based Access to Related Online Lexical Resources Thierry Declerck, Karlheinz Mörth DFKI GmbH, Language Technology Lab, Austrian Centre for Digital Humanities E-mail: [email protected], [email protected] Abstract We present an approach aiming at a method to support sense-based cross-dialectal – and cross-lexicon – access to dictionaries of spoken Arabic varieties. The original lexical data consists of three TEI P5 encoded dictionaries describing varieties spoken in Cairo, Damascus and Tunis. This data is included in the Vienna Corpus of Arabic Varieties (VICAV). We briefly present this data, before summarizing the TEI approach for encoding senses in lexical resources. We discuss certain issues related to the TEI representation when it comes to the possibility to provide for a sense-based access to the lexical data. We investigate the use of the recent W3C Ontology-Lexicon Community Group (OntoLex) modeling work and present first results of the mapping of the TEI encoded data onto its ontolex model and show how this new representation format can efficiently support cross-dictionary sense-based access. Keywords: Senses; ontolex; TEI; SKOS; Arabic Dialects 1 Introduction We present an approach for ensuring sense-based access to distinct but related online lexical resources. The basis of our work consists of three dictionaries of Arabic varieties developed at different times by different colleagues (Mörth et al. 2015; Procházka & Mörth 2013; Mörth et al. 2014) in the context of the Vienna Corpus of Arabic Varieties (VICAV) project1. This data has been encoded in TEI2, following the approach described in (Budin et al. -
Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework †
information Article Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework † Frances Gillis-Webber Library and Information Studies Centre, University of Cape Town, Woolsack Drive, Rondebosch, Cape Town 7701, South Africa; [email protected] † This paper is an extended version of the paper presented at the 6th Workshop on Linked Data in Linguistics (LDL-2018), 11th Edition of the Language Resources and Evaluation Conference (LREC 2018). Received: 15 September 2018; Accepted: 2 November 2018; Published: 6 November 2018 Abstract: The English-Xhosa Dictionary for Nurses (EXDN) is a bilingual, unidirectional printed dictionary in the public domain, with English and isiXhosa as the language pair. By extending the digitisation efforts of EXDN from a human-readable digital object to a machine-readable state, using Resource Description Framework (RDF) as the data model, semantically interoperable structured data can be created, thus enabling EXDN’s data to be reused, aggregated and integrated with other language resources, where it can serve as a potential aid in the development of future language resources for isiXhosa, an under-resourced language in South Africa. The methodological guidelines for the construction of a Linguistic Linked Data framework (LLDF) for a lexicographic resource, as applied to EXDN, are described, where an LLDF can be defined as a framework: (1) which describes data in RDF, (2) using a model designed for the representation of linguistic information, (3) which adheres to Linked Data principles, and (4) which supports versioning, allowing for change. The result is a bidirectional lexicographic resource, previously bounded and static, now unbounded and evolving, with the ability to extend to multilingualism. -
When Linguistics Meets Web Technologies. Recent Advances in Modelling Linguistic Linked Open Data
Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4 When Linguistics Meets Web Technologies. 4 5 5 6 6 7 Recent advances in Modelling Linguistic 7 8 8 9 Linked Open Data 9 10 10 11 a b c d 11 12 Anas Fahad Khan , Christian Chiarcos , Thierry Declerck , Daniela Gifu , 12 e f g h 13 Elena González-Blanco García , Jorge Gracia , Maxim Ionov , Penny Labropoulou , 13 i j k l 14 Francesco Mambrini , John P. McCrae , Émilie Pagé-Perron , Marco Passarotti , 14 m n 15 Salvador Ros Muñoz , Ciprian-Octavian Truica˘ 15 16 a Istituto di Linguistica Computazionale «A. Zampolli», Consiglio Nazionale delle Ricerche, Italy 16 17 E-mail: [email protected] 17 18 b Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 18 19 E-mail: [email protected] 19 20 c DFKI GmbH, Multilinguality and Language Technology, Saarbrücken, Germany 20 21 E-mail: [email protected] 21 22 d Faculty of Computer Science, Alexandru Ioan Cuza University of Iasi, Romania 22 23 E-mail: [email protected] 23 24 e Laboratory of Innovation on Digital Humanities, IE University, Spain 24 25 E-mail: [email protected] 25 26 f Aragon Institute of Engineering Research, University of Zaragoza, Spain 26 27 E-mail: [email protected] 27 28 g Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 28 29 E-mail: [email protected] 29 30 h Institute for Language and Speech Processing, Athena Research Center, Greece 30 31 E-mail: [email protected] 31 32 i CIRCSE Research Centre, Università -
Models to Represent Linguistic Linked Data∗
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repositorio Universidad de Zaragoza Natural Language Engineering: page 1 of 49. c Cambridge University Press 2018 1 doi:10.1017/S1351324918000347 Models to represent linguistic linked data∗ J. BOSQUE-GIL1, J. GRACIA1,2, E. MONTIEL-PONSODA1 and A. G OMEZ-P´ EREZ´ 1 1Ontology Engineering Group, Escuela T´ecnica Superior de Ingenieros Informaticos,´ Universidad Polit´ecnica de Madrid, Spain e-mails: [email protected], [email protected], [email protected] 2Aragon Institute of Engineering Research (I3A), University of Zaragoza, Spain e-mail: [email protected] (Received 13 June 2017; revised 6 August 2018; accepted 10 August 2018 ) Abstract As the interest of the Semantic Web and computational linguistics communities in linguistic linked data (LLD) keeps increasing and the number of contributions that dwell on LLD rapidly grows, scholars (and linguists in particular) interested in the development of LLD resources sometimes find it difficult to determine which mechanism is suitable for their needs and which challenges have already been addressed. This review seeks to present the state of the art on the models, ontologies and their extensions to represent language resources as LLD by focusing on the nature of the linguistic content they aim to encode. Four basic groups of models are distinguished in this work: models to represent the main elements of lexical resources (group 1), vocabularies developed as extensions to models in group 1 and ontologies that provide more granularity on specific levels of linguistic analysis (group 2), catalogues of linguistic data categories (group 3) and other models such as corpora models or service- oriented ones (group 4). -
Towards a Sense-Based Access to Related Online Lexical
Towards a Sense-based Access to Related Online Lexical Resources Thierry Declerck, Karlheinz Mörth DFKI GmbH, Language Technology Lab, Austrian Centre for Digital Humanities e-mail: [email protected], [email protected] Abstract We present an approach aiming at a method to support sense-based cross-dialectal – and cross-lexicon – access to dictionaries of spoken Arabic varieties. The original lexical data consists of three TEI P5 encoded dictionaries describing varieties spoken in Cairo, Damascus and Tunis. This data is included in the Vienna Corpus of Arabic Varieties (VICAV). We briefly present this data, before summarizing the TEI approach for encoding senses in lexical resources. We discuss certain issues related to the TEI representation when it comes to the possibility to provide for a sense-based access to the lexical data. We investigate the use of the recent W3C Ontology-Lexicon Community Group (OntoLex) modeling work and present first results of the mapping of the TEI encoded data onto its ontolex model and show how this new representation format can efficiently support cross-dictionary sense-based access. Keywords: Venses; ontolex; TEI; SKOS; Arabic Gialects 1 Introduction We present an approach for ensuring sense-based access to distinct but related online lexical resources. The basis of our work consists of three dictionaries of Arabic varieties developed at different times by different colleagues (Mörth et al. 2015; Procházka & Mörth 2013; Mörth et al. 2014) in the context of the Vienna Corpus of Arabic Varieties (VICAV) project1 This data has been encoded in TEI2 following the approach described in (Budin et al.