The POSTDATA Network of Ontologies for European Poetry
Total Page:16
File Type:pdf, Size:1020Kb
The POSTDATA network of Ontologies for European Poetry. María Luisa Diez Platasa , Salvador Rosa*, Elena González-Blancob, Helena Bermúdezc, Oscar Corchod, Javier de la Rosaa, Alvaro Pérez a POSTADATA Project. SSCC. Escuela Técnica Superior de Informática, UNED, Madrid, Spain b Coverwallet. Madrid, Spain c Section des Sciences du Langage et de l’Information, Université de Lausanne, Suisse d OntoloGy EnGineerinG Group. Facultad de Informática, Universidad Politécnica, Madrid Abstract. One of the lines of work in Digital Humanities is concerned with standardization processes to describe traditional concepts using computer-readable languages. In regard to Literary studies, poetry is a particularly complex domain due to, among other aspects, the special use of language that it implies. This paper presents a network of ontologies for capturing the poetry domain knowledge. The most significative ontologies are presented. These ontologies are related to the poetic work, and its structural and prosodic components. A date ontology that represents the especial needs of literary works is presented as well. This work is part of the results of the POSTDATA ERC (Poetry Standardization and Linked Open Data) project, which aims to provide a means for poetry researchers to publish their semantically enriched data as Linked Open Data (LOD), in the context of European poetry. Keywords: European Poetry, Standardization, Network of Ontologies, Interoperability, Linked Open Data 1. Introduction of theories is even bigger when comparing poetry schools from different languages and periods. One of The need for standardization has increased signifi- the most significant conceptual and terminological cantly in different research fields as a standard way of problems is that, even when a set of poetic works is understanding and exchanging information. Many sci- formalized under a repertoire, each repertoire belongs entific disciplines have established formal protocols to its poetical tradition, and each tradition has inde- and languages, which have been quickly adopted and pendently developed its analytical terminology, for adapted to their particular problems. Some humanities centuries, in some cases [1]. The result of this uncoor- and cultural disciplines have followed, however, an in- dinated evolution is a great variety of terminologies to dependent path in which creativity and tradition play explain similar metrical phenomena through the dif- an essential role. Literature, and especially poetry, is a ferent poetic systems whose correspondences have clear reflection of this idiosyncrasy. been hardly studied. For example, the same quatrain From the philological point of view, there is no uni- of dodecasyllables can be encoded in different ways form academic approach to analyze, classify or study depending on the philological tradition: the different poetic manifestations, and the divergence (i.e. 12A12A12A12A or 4x(7pp+7p) or 4aaaa) * Corresponding author. E-mail:[email protected]. 1 or even named with a different meaning: “alexan- of these areas as complementary ontologies. The result drine” means a 14-syllable line in Spanish but only 12- of the whole process has led us to the development of syllables in French [2]. a network of ontologies for European poetry. As a result, if a researcher were to look for quatrains This paper presents the methodology carried out to of dodecasyllables in different traditions, it would be build a network of ontologies for covering the poetry necessary to visit each database independently and domain knowledge and the most significant ontologies then carry out different searches adapting the query to of this domain. This work is part of the results of the the conventions of the resources. POSTDATA ERC (Poetry Standardization and There is one additional drawback: research in this Linked Open Data) project, which aims to provide a field is usually conducted in an individual and isolated means for poetry researchers to publish their semanti- manner, and there is a certain lack of communication cally enriched data as Linked Open Data (LOD), in the with other areas of knowledge. context of European poetry. There are also significant technical issues, as these The document is structured as follows. In section repertoires were created in different periods, and §2, we present some previous results related to ontol- stand-alone collected databases drive most of them [3– ogies in literature, especially in the domain of poetry. 8]. Interoperability among all these collections would Section §3 presents a description of the methodology be useful to perform comparative studies and to move used for the ontological development. Section §4 pre- a step forward beyond the modern philological state- sents a detailed description of the most relevant ontol- of-the-art, to explain phenomena like the origins of ogies developed. Finally, §5 outlines the conclusions vernacular poetry or the evolution from accentual to and future work. syllabic rhythmical patterns. Although the current technical infrastructures are prepared to harvest such collections and provide ac- 2. Related works cess to them by a search engine, it is necessary to standardize metadata and vocabularies at a philologi- The progressive transformation of Humanities into cal level to be able to climb up the semantic layer and “Digital Humanities,” is accompanied by the creation link data between different traditions [9–11] of new standards, such as the Text Encoding Initiative In this context, the use of technologies applied to TEI-XML1, Dublin Core2 or CIDOC-CRM3, among poetry is ground-breaking, as this way of representing others, to describe traditional concepts with computer- distributed literary collections as machine-readable re- readable languages. These systems are developing fast positories will open the door to pose new research in several areas, such as digital text editions, libraries, questions and to perform comparative philological or archives, and it exits a significant number of pro- analysis between heterogeneous poetic corpora with jects working with them as TexGrid4, OpenEdition5 different formats. 6 All these difficulties and problems of access to po- or Scholar Digital Editions (SDE) . etic resources and, in short, to the impossibility of hav- Although semantic web technologies have had ing ways of processing this information in a com- great success in archives, libraries and museums pletely and efficiently have been the origin and incen- (group known as LODLAM7), however, the applica- tive for the conception of a poetry ontology network , tion of these technologies to poetry is still limited, [9,12]. [15,16], and there is not a conceptual model of ontol- For this purpose, we have extracted from a set of ogy referred to metrics and poetry yet. repertoires of different poetic traditions and periods, The first attempt to build a poetry ontology can be [13,14], the concepts and relationships necessary to found in the ReMetca project [5] that defined a con- achieve the representation of a universal and complete ceptual model for poetry and participated in the defi- poetry domain. From this study, we have identified ar- nition of TEI-Verse module. However, it is necessary eas of knowledge that are complementary to the cen- to expand and complete it to reflect the different pos- tral core of poetry knowledge. We have modeled each sibilities of poetic properties and relationships. The 1 5 https://tei-c.org/ https://www.openedition.org/ 2 6 https://www.dublincore.org/specifications/dublin-core/ http://www.sd-editions.com/ 3 7 http://www.cidoc-crm.org/ http://lodlam.net/ 4 https://textgrid.de/ 2 next closest works related to this topic are probably Model for Ontologies (Lemon)12 designed for model- CIDOC CRM, Conceptual Reference Model (CIDOC- ing machine-readable dictionaries and lexicons. CRM)8, an ontology that formally describes the con- Lemon covers aspects of lexical decomposition, sen- cepts and relationships used to document cultural her- tence structure, syntax, variation, morphology, and itage. This model is more focused on the representa- mapping of lexical ontology. The Gold13 ontology is tion of museums’ heritage works, although it contains a complete ontology for descriptive linguistics and concepts to the representation of entities such as peo- formally describes the most basic categories and rela- ple and places associated with the works. Other related tionships used in the scientific description of human ontologies are Functional Requirements of Biblio- language. It tries to solve the problems of linguistic graphic Records (FRBR)9 and FRBRoo10. FRBR of- data tagging. One of the important features of this on- fers a perspective on the structure and relationships of tology is that it is applicable to all languages. The Rhe- bibliographic and authority records [17]. The most torical Annotation Ontology Project (RAOP)14, a spe- significant entities are Work, Expression, Manifesta- cific domain ontology, is built for the annotation of tion, and Item, which represent the different ways of speech figures and the rhetorical aspects of written and conceiving a literary work as a text or physical re- oral texts. Thus, it can be mapped to represent the source. FRBRoo is an object-oriented version of structures of a rhetorical system. This project is one of FRBR combined with the CIDOC-CRM model, thus the possible approaches that have been taken into ac- harmonizing information from museums, archives, count for the digitization of speech figures through the