When Linguistics Meets Web Technologies. Recent Advances in Modelling Linguistic Linked Open Data
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 0 (0) 1 1 IOS Press 1 1 2 2 3 3 4When Linguistics Meets Web Technologies. 4 5 5 6 6 7Recent advances in Modelling Linguistic 7 8 8 9Linked Open Data 9 10 10 11a b c d 11 12Anas Fahad Khan , Christian Chiarcos , Thierry Declerck , Daniela Gifu , 12 e f b h 13Elena González-Blanco García , Jorge Gracia , Maxim Ionov , Penny Labropoulou , 13 i j k i 14Francesco Mambrini , John P. McCrae , Émilie Pagé-Perron , Marco Passarotti , 14 l m 15Salvador Ros Muñoz , Ciprian-Octavian Truica˘ 15 16a Istituto di Linguistica Computazionale «A. Zampolli», Consiglio Nazionale delle Ricerche, Italy 16 17E-mail: [email protected] 17 18b Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 18 19E-mail: [email protected], 19 20E-mail: [email protected] 20 21c DFKI GmbH, Multilinguality and Language Technology, Saarbrücken, Germany 21 22E-mail: [email protected] 22 23d Faculty of Computer Science, Alexandru Ioan Cuza University of Iasi, Romania 23 24E-mail: [email protected] 24 25e Laboratory of Innovation on Digital Humanities, IE University, Spain 25 26E-mail: [email protected] 26 27f Aragon Institute of Engineering Research, University of Zaragoza, Spain 27 28E-mail: [email protected] 28 29g Applied Computational Linguistics Lab, Goethe-Universität Frankfurt am Main, Germany 29 30E-mail: [email protected] 30 31h Institute for Language and Speech Processing, Athena Research Center, Greece 31 32E-mail: [email protected] 32 33i CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy 33 34E-mail: [email protected], 34 35E-mail: [email protected] 35 36j Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway, 36 37Ireland 37 38E-mail: [email protected] 38 39k Wolfson College, University of Oxford, United Kingdom 39 40E-mail: [email protected] 40 41l Laboratory of Innovation on Digital Humanities, National Distance Education University UNED, Spain 41 42E-mail: [email protected] 42 43m Computer Science and Engineering Department, Faculty of Automatic Control and Computers, University 43 44Politehnica of Bucharest, Romania 44 45E-mail: [email protected] 45 46 46 47 47 48 48 49 49 50Abstract. This article provides an up-to-date and comprehensive survey of models (including vocabularies, taxonomies and 50 ontologies) used for representing linguistic linked data (LLD). It focuses on the latest developments in the area and both builds 51 51 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 AF. Khan et al. / When Linguistics Meets Web Technologies. Recent advances in Modelling Linguistic Linked Open Data 1upon and complements previous works covering similar territory. The article begins with an overview of recent trends which 1 2have had an impact on linked data models and vocabularies, such as the growing influence of the FAIR guidelines, the funding 2 3of several major projects in which LLD is a key component, and the increasing importance of the relationship of the digital 3 4humanities with LLD. Next, we give an overview of some of the most well known vocabularies and models in LLD. After this 4 5we look at some of the latest developments in community standards and initiatives such as OntoLex-Lemon as well as recent 5 work which has been in carried out in corpora and annotation and LLD including a discussion of the LLD metadata vocabularies 6 6 META-SHARE and lime and language identifiers. In the following part of the paper we look at work which has been realised in 7 7 a number of recent projects and which has a significant impact on LLD vocabularies and models. 8 8 9Keywords: linguistic linked data, FAIR, corpora, annotation, language resources, OntoLex-Lemon, Digital Humanities, metadata, 9 10models 10 11 11 12 12 13 13 141. Introduction tion discussing projects, Section 5, and the conclusion, 14 15Section 6. 15 16The growing popularity of linked data, and espe- 16 17cially of linked open data (that is, linked data with an 17 18open license), as a means of publishing language re- 2. Setting the Scene: An Overview of Relevant 18 19sources (lexica, corpora, data categories, etc.) necessi- Trends for LLD 19 20tates a greater emphasis on models for linguistic linked 20 21data (LLD) since these are key to what makes linked The trends we have decided to focus on in this 21 22data resources so reusable and so interoperable (at a overview are the FAIRification of data in Section 2.1, 22 23semantic level). The purpose of this article is to pro- the importance of projects to LLD models in Section 23 24vide a comprehensive and up-to-date survey of mod- 2.2, and finally the increasing importance of Digital 24 25els used for representing linguistic linked data. It will Humanities use cases in Section 2.3. 25 26focus on the latest developments and will both build 26 27upon as well as trying to complement previous works 2.1. FAIR New World 27 28covering similar territory by avoiding too much repeti- 28 29With the growing importance of Open Science ini- 29 tion and overlap with the latter. 30tiatives, and especially those promoting the FAIR 30 In the following section, Section 2, we give an 31guidelines (where FAIR stands for Findable, Accessi- 31 overview of a number of trends from the last few years 32ble, Interoperable and Reusable) [1] – and the conse- 32 which have had, or which are likely to have, a signifi- 33quent emphasis on the modelling, creation and publi- 33 cant impact on the definition and/or use of LLD mod- 34cation of language resources as FAIR digital resources 34 els. We relate these trends to the rest of the article by 35– shared models and vocabularies have begun to take 35 highlighting relevant sections of the article (in bold). 36on an increasingly prominent role. Although the lin- 36 This overview of trends will help to locate the present 37guistic linked data community has been active in pro- 37 work within a wider research context, something that 38moting shared RDF vocabularies and models for years, 38 39is extremely useful in an area as active as linguistic this new emphasis on FAIR is likely to have a con- 39 40linked data, as well as assisting readers in navigating siderable impact in several ways, not least in terms of 40 41the rest of the article. Next, in Section 2.4, we com- the necessity for these models to demonstrate a greater 41 42pare the present article with other related work, includ- coverage, and to be more interoperable one with an- 42 43ing an earlier survey of LLD models, in order to help other. We will look at one series of FAIR related rec- 43 44clarify the topics and approach of the present work. ommendations for models in Section 3 and see how 44 45Section 3 gives an overview of the most widely used they might be applied to the case of LLD. However in 45 46models in LLD. Then in Section 4, we look at recent the rest of the subsection we will take a closer look 46 47developments in community standards and initiatives. at the FAIR principles themselves and show why their 47 48These include the latest extensions of the OntoLex- widespread adoption is likely to lead to a greater role 48 49Lemon model in Section 4.1, a discussion of relevant for LLD models and vocabularies in the future. 49 50work in copora and annotations in Section 4.2, and a In The FAIR Guiding Principles for scientific data 50 51section on metadata Section 4.3. Finally there is a sec- management and stewardship [1], the article which 51 AF. Khan et al. / When Linguistics Meets Web Technologies. Recent advances in Modelling Linguistic Linked Open Data 3 1first articulated the well known FAIR principles, the – I1. (meta)data use a formal, accessible, shared, 1 2authors clearly state that the criteria proposed by these and broadly applicable language for knowledge 2 3principles are intended both "for machines and peo- representation. 3 4ple" and that they provide "‘steps along a path’ to ma- – I2. (meta)data use vocabularies that follow FAIR 4 5chine actionability", where the latter is understood to principles. 5 6describe structured data that would allow a "computa- 6 7tional data explorer" to determine: It is important to note that the emphasis placed 7 8on machine actionability in FAIR resources (that 8 9– The type of a "digital research object" is, on enabling computational agents to find rele- 9 10– Its usefulness with respect to tasks to be carried vant datasets and resources and to take "appropriate 10 11out action" when they find them) gives Semantic Web 11 12– Its usability especially with respect to licensing vocabularies/registries a substantial advantage over 12 13issues, represented in a way that would allow the other (non-Semantic Web native) standards in the 13 14agent to take "appropriate action". field of linguistics like the Text Encoding Initiative 14 3 15The current popularity of the FAIR principles and, (TEI) guidelines [2], the Lexical Markup Framework 15 16in particular, their promotion by governments and re- (LMF) [3] or the Morpho-syntactic Annotation Frame- 16 17search funding bodies, such as the European Commis- work (MAF) [4]. 17 18sion,1 through several national and international ini- For a start, none of these other standards possess 18 19tiatives reflects a wider recognition of the potential of a ‘native’, widely-used, widely technically supported 19 20structured and machine actionable data in changing knowledge representation language for describing the 20 21how research is carried out, and especially in helping semantics of vocabulary terms in a machine readable 21 22to support open science practices.