When Linguistics Meets Web Technologies. Recent Advances in Modelling Linguistic Linked Open Data

Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4 When Linguistics Meets Web Technologies. 4 5 5 6 6 7 Recent advances in Modelling Linguistic 7 8 8 9 Linked Open Data 9 10 10 11 a b c d 11 12 Anas Fahad Khan , Christian Chiarcos , Thierry Declerck , Daniela Gifu , 12 e f g h 13 Elena González-Blanco García , Jorge Gracia , Maxim Ionov , Penny Labropoulou , 13 i j k l 14 Francesco Mambrini , John P. McCrae , Émilie Pagé-Perron , Marco Passarotti , 14 m n 15 Salvador Ros Muñoz , Ciprian-Octavian Truica˘ 15 16 a Istituto di Linguistica Computazionale «A. Zampolli», Consiglio Nazionale delle Ricerche, Italy 16 17 E-mail: [email protected] 17 18 b Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 18 19 E-mail: [email protected] 19 20 c DFKI GmbH, Multilinguality and Language Technology, Saarbrücken, Germany 20 21 E-mail: [email protected] 21 22 d Faculty of Computer Science, Alexandru Ioan Cuza University of Iasi, Romania 22 23 E-mail: [email protected] 23 24 e Laboratory of Innovation on Digital Humanities, IE University, Spain 24 25 E-mail: [email protected] 25 26 f Aragon Institute of Engineering Research, University of Zaragoza, Spain 26 27 E-mail: [email protected] 27 28 g Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 28 29 E-mail: [email protected] 29 30 h Institute for Language and Speech Processing, Athena Research Center, Greece 30 31 E-mail: [email protected] 31 32 i CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy 32 33 E-mail: [email protected] 33 34 j Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, 34 35 Ireland 35 36 E-mail: [email protected] 36 37 k Wolfson College, University of Oxford, United Kingdom 37 38 E-mail: [email protected] 38 39 l CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy 39 40 E-mail: [email protected] 40 41 m Laboratory of Innovation on Digital Humanities, National Distance Education University UNED, Spain 41 42 E-mail: [email protected] 42 43 n Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University 43 44 Politehnica of Bucharest, Romania 44 45 E-mail: [email protected] 45 46 46 47 47 48 48 49 49 50 Abstract. 50 51 51 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 AF. Khan et al. / When Linguistics Meets Web Technologies. Recent advances in Modelling Linguistic Linked Open Data 1 This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and ontologies) 1 2 used for representing linguistic linked data (LLD). It focuses on the latest developments and both builds upon and complement 2 3 previous works covering similar territory. The article begins with an overview of recent trends which have had an impact on 3 4 linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding of several major projects 4 5 in which LLD is a key component, and the increasing importance of the relationship of the Digital Humanities with LLD. Next, 5 we give an overview of some of the most well known vocabularies and models in LLD. After this we look at some of the latest 6 6 developments in community standards and initiatives such as OntoLex-lemon as well as recent work which has been in carried 7 7 out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies METASHARE and lime and 8 language identifiers. Following this we look at work which has been realised in a number of recent projects and which has a 8 9 significant impact on LLD vocabularies and models. 9 10 10 11 Keywords: linguistic linked data, FAIR, corpora, annotation, language resources, OntoLex-Lemon, Digital Humanities, metadata, 11 12 models 12 13 13 14 14 15 15 16 1. Introduction data Section 4.3. Finally there is a section discussing 16 17 projects, Section 5, and the conclusion, Section 6. 17 18 The growing popularity of linked data, and espe- 18 19 cially as linked open data (that is, linked data with an 19 20 open license) as a means of publishing language re- 2. Setting the Scene: An Overview of Relevant 20 21 sources (lexica, corpora, data categories, etc) has led Trends for LLD 21 22 to the need for a greater focus on models for linguistic 22 23 linked data (LLD) since these are key to what makes The trends we have decided to focus on in this 23 24 linked data resources so reusable and interoperable. overview are the FAIRification of data in Section 2.1, 24 25 The purpose of this article is to provide an up-to-date the importance of projects to LLD models in Section 25 26 and comprehensive survey of models (including vo- 2.2, and finally the increasing importance of Digital 26 27 cabularies, taxonomies and ontologies) used for repre- Humanities use cases in Section 2.3. 27 28 senting linguistic linked data. It will focus on the lat- 28 29 est developments and both build upon and complement 2.1. FAIR New World 29 30 previous works covering similar territory, avoiding too 30 31 much repetition and overlap with the latter. In the fol- With the growing importance of Open Science ini- 31 32 lowing Section 2, we give an overview of a number tiatives, and especially those promoting the FAIR 32 33 of trends from the last few years which have had, or guidelines (where FAIR stands for Findable, Accessi- 33 34 which are likely to have, a significant impact on the ble, Interoperable and Reusable) [1] – and the conse- 34 35 definition and/or use of LLD models. We relate these quent emphasis on the modelling, creation and publi- 35 36 trends to the rest of the article by highlighting rele- cation of language resources as FAIR digital resources 36 37 vant sections of the article (in bold). This overview – shared models and vocabularies have begun to take 37 38 of trends will help to locate the present work within on an increasingly prominent role. Although the lin- 38 39 a wider research context, something that is extremely guistic linked data community has been active in pro- 39 40 useful in an area as active as linguistic linked data, as moting shared RDF vocabularies and models for years, 40 41 well as assisting readers in navigating the rest of the ar- this new emphasis on FAIR is likely to have a con- 41 42 ticle. Next, in Section 2.4, we compare the present ar- siderable impact in several ways, not least in terms of 42 43 ticle with other related work, including an earlier sur- the necessity for these models to demonstrate a greater 43 44 vey of LLD models, in order to help clarify the top- coverage, and to be more interoperable one with an- 44 45 ics and approach of the present work. Section 3 gives other. We will look at one series of FAIR related rec- 45 46 an overview of the most widely used models in LLD. ommendations for models in Section 3 and see how 46 47 Then in Section 4, we look at recent developments they might be applied to the case of LLD. However in 47 48 in community standards and initiatives. These include the rest of the subsection we will take a closer look 48 49 the latest extensions of the OntoLex-Lemon model in at the FAIR principles themselves and show why their 49 50 Section 4.1, a discussion of relevant work in copora widespread adoption is likely to lead to a greater role 50 51 and annotations in Section 4.2, and a section on meta- for LLD models and vocabularies in the future. 51 AF. Khan et al. / When Linguistics Meets Web Technologies. Recent advances in Modelling Linguistic Linked Open Data 3 1 In The FAIR Guiding Principles for scientific data – F2. data are described with rich metadata. 1 2 management and stewardship [1], the article which – I1. (meta)data use a formal, accessible, shared, 2 3 first articulated the well known FAIR principles, the and broadly applicable language for knowledge 3 4 authors clearly state that the criteria proposed by these representation. 4 5 principles are intended both "for machines and peo- – I2. (meta)data use vocabularies that follow FAIR 5 6 ple" and that they provide "‘steps along a path’ to ma- principles. 6 7 chine actionability", where the latter is understood to 7 8 describe structured data that would allow a "computa- It is important to note that the emphasis placed 8 9 tional data explorer" to determine: on machine actionability in FAIR resources (that is, 9 10 with respect to allowing computational agents to take 10 – The type of a "digital research object" 11 "appropriate action" with respect to a dataset or re- 11 – Its usefulness with respect to tasks to be carried 12 source) gives Semantic Web vocabularies/registries a 12 out 13 substantial advantage over other (non-Semantic Web 13 – Its usability especially with respect to licensing 14 native) standards like the Text Encoding Initiative 14 issues, represented in a way that would allow the 2 15 (TEI) guidelines [2], the Lexical Markup Frame- 15 agent to take "appropriate action".

When Linguistics Meets Web Technologies. Recent Advances in Modelling Linguistic Linked Open Data

Towards a Sense-Based Access to Related Online Lexical Resources

Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework †

Models to Represent Linguistic Linked Data∗

Towards a Sense-Based Access to Related Online Lexical

Towards Ontolex-Lemon Editing in Vocbench 3