Using an Ontology for Representing the Knowledge on Literary Texts: the Dante Alighieri Case Study
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 1 (2015) 1 1 IOS Press Using an Ontology for Representing the Knowledge on Literary Texts: the Dante Alighieri Case Study Editor(s): Eero Hyvönen, Aalto University, Finland Solicited review(s): Øyvind Eide, University of Passau, Germany; Arianna Betti, University of Amsterdam, The Netherlands; one anonymous reviewer Valentina Bartalesi a;∗ and Carlo Meghini a a Istituto di Scienza e Tecnologie dell’Informazione “Alessandro Faedo” (ISTI) – CNR, via G. Moruzzi 1, 56127, Pisa, Italy E-mail: {valentina.bartalesi,carlo.meghini}@isti.cnr.it Abstract. This paper describes a digital library developed within the “Towards a Digital Dante Encyclopaedia” project, a three years Italian National Research Project that aims at building services supporting scholars in creating, evolving and consulting a digital encyclopaedia of Dante Alighieri’s works. The digital library is based on a knowledge base storing knowledge on the primary sources that Dante refers to in his works, i.e. the works of other authors Dante refers to in his texts. At present, this information is scattered on many paper books, making it difficult to systematically overview the cultural background of Dante and to obtain a well-founded perception of how this background was gradually set up in time. The same applies also to other authors, therefore the applicability of our work extends well beyond the specific author we are considering in our project. The digital library that we are building is based on an ontology for representing the knowledge on one author’s works and on the primary sources embedded in the commentaries to these works. Following this approach, a semantic network of Dante’s works and of references to primary sources of these works was created. Furthermore, a web application allowing users to explore the semantic network in various ways and to visualize statistical information about the references as charts and tables was developed. Keywords: Ontology, Semantic Web, Digital Libraries, Digital Humanities 1. Introduction edge encoded in literary texts, and report it explic- itly (e.g. [6]). One important kind of knowledge con- Literary works, especially those written centuries cerns the primary sources literary texts refer to. In our ago, hold information which, for several reasons, re- case study, primary sources are the works of other au- mains to modern readers largely implicit at best, and thors (eg.Aristoteles) referred to in his texts by Dante outright inaccessible at worst. It is the task of spe- Alighieri (1265-1321) - the major Italian poet of the cialized scholars in the humanities to help readers late Middle Ages. In their commentaries, scholars typ- understand the implicit knowledge contained in lit- ically express knowledge about the primary sources re- erary works by carefully analyzing the text of such ferred to by Dante in natural language. Using natural works. This typically happens in the so-called com- language limits scholars in their advances insofar as it mentaries, modern works of interpretive research that prevents automatic inferences of new information that decode, often line by line, several kinds of knowl- may be useful for their studies. For instance, in the case of references to primary sources, these inferences *Corresponding author. E-mail: [email protected] may concern the total amount of references to a certain 1570-0844/15/$27.50 c 2015 – IOS Press and the authors. All rights reserved 2 V. Bartalesi and C. Meghini / An Ontology for Representing Knowledge on Literary Texts work or to an author. Such knowledge may be useful to represent the terms that we did not find in the ana- reconstruct the evolution of the author’s cultural back- lyzed ontologies. The resulting ontology is expressed ground. The goal of our work is to overcome this prob- in RDF/S1. Once the ontology had been defined, we lem by creating an ontology providing a formal rep- populated it with the knowledge included in the Ex- resentation of the knowledge on an author’s primary cel file but also with the knowledge included in other sources. In the digital humanities literature, there are commentaries to Dante’s works. We stored the result- many ontologies focusing on different aspects of tex- ing RDF graph into a Virtuoso triple store [5]. On top tual information. Each of these ontologies represents a of this graph, we developed a web application allow- set of possible interpretations of the source text(s). Up ing a number of data visualizations of Dante’s works to now, an ontology for representing knowledge about and primary sources. Such visualizations are expected primary sources of literary texts has not been devel- to provide useful knowledge to the scholars that work oped yet and there are not projects focussed on giving a on the creation of a complete encyclopaedia of Dante’s logical representation of this knowledge. Furthermore, works. our ontology extends well beyond the specific author The paper is structured as follows: Section 2 reports we are considering in our project. the analysis and the evaluation of the initial represen- The work we present in this paper is part of the tation requirements. In Section 3, the review of the ex- “Towards a Digital Dante Encyclopaedia”, an Italian isting ontologies is reported, while Section 4 presents National Research project supporting scholars in for- the classes and properties of these ontologies that we mally expressing the knowledge about primary sources re-used. It also introduces the classes and properties present in Dante’s works and more in general in lit- that we added to capture missing aspects. Section 5 erary texts. Moreover, the project aims at developing presents the ontology population process. Section 6 de- a system to automatically discover complex semantic scribes the web application that we developed on the relations among different parts of the text or between basis of our ontology. Finally, in Section 7 we report the text and its related resources. This work presents our conclusive remarks. the ontology developed on the basis of a previous pre- liminary study [1] to represent knowledge on the refer- ences to primary sources included in literary texts and 2. Requirements in their commentaries, with a specific focus on Dante’s works. In particular, after a deep analysis of the related The references to primary sources are contained in ontologies, as reported in [1], in comparison with the commentaries to Dante’s works written by authorita- preliminary study, we report here the result of the re- tive scholars. Each commentary applies to a fragment fine process of our ontology and we describe the web of a work by Dante and asserts, among other things, application that we developed on the basis of the final that the Dante’s fragment makes a reference to a work version of this ontology. We report the process that we of another author. The reference may be to a specific followed to extract the relevant knowledge from nat- fragment or to the totality of the primary source. Our ural language commentaries to Dante’s works and to first task was to understand how formally represent- express this knowledge with a formal logic semantic ing the described knowledge to implement automatic representation. We started from an Excel file provided services on top of it. In order to achieve this goal, we by an authoritative Italian scholar, reporting commen- started with the study of a spreadsheet, provided by taries and references to primary sources embedded in an authoritative Italian scholar (Mirko Tavoni, profes- a Dante’s work. On the basis of the analysis of the sor of History of the Italian Language and Italian Lin- contents of the Excel file, we developed a conceptual- guistics at the University of Pisa) reporting the rele- ization, i.e. a set of classes and properties underlying vant knowledge relative to Convivio [6], a philosoph- the knowledge reported in the spreadsheet. Next, we ical essay written by Dante between 1304 and 1307. reviewed existing ontologies in the Digital Libraries Each row of the spreadsheet reports (see Figure 1): domain in order to create a vocabulary to express the identified classes and properties. We adopted several – the number of the book, of the chapter (e.g. 1.01 terms (i.e. the names and the corresponding defini- indicates the first book and the first chapter of tions of classes and properties) from these ontologies to maximize the interoperability of our representation. 1http://www.w3.org/RDF/ and http://www.w3.org/TR/rdf- Finally, we added our own classes and properties to schema/ V. Bartalesi and C. Meghini / An Ontology for Representing Knowledge on Literary Texts 3 Convivio) and of the paragraph to which the com- on standard resources. Our aim was instead to reuse mentary is applied; available and standard resources for interoperability, – the Dante’s text fragment which the commentary thus we investigated the possibility of using a more is applied to (e.g., “Sì come dice lo filosofo nel widely known vocabulary. The choice finally was to principio della Prima Filosofia”/ As the Philoso- draw the thematic areas from Nuovo Soggettario2. The pher says at the beginning of the First Philoso- Nuovo Soggettario, edited by the National Central Li- phy); brary of Florence, is a subject indexing tool for various – a fragment of the commentary to the Dante’s text; types of information resources. It has been developed – primary source, structured as: in compliance with the International Federation of Li- brary Associations and Institutions (IFLA) recommen- ∗ author (e.g. Aristotle); dations3 and represents an important asset for the Ital- ∗ title (e.g. Metaphysics); ian scholar community. The fact that an RDF version ∗ thematic area (e.g. Aristotelianism). of this resource is freely available online, expressed through the SKOS [10] vocabulary, also played a pos- itive role.