Storing and Processing Personal Narratives in the Context of Cultural Legacy Preservation
Total Page:16
File Type:pdf, Size:1020Kb
Storing and Processing Personal Narratives in the Context of Cultural Legacy Preservation Pierrick Bruneau, Olivier Parisot and Thomas Tamisier Luxembourg Institute of Science of Technology, L-4362 Esch-sur-Alzette, Belvaux, Luxembourg Keywords: Narrative Data Model, Narrative Knowledge Extraction, RESTful API. Abstract: An important, yet underestimated, aspect of cultural heritage preservation is the analysis of personal narratives told by citizens. In this paper, we present a data model and implementation towards facilitating narratives storage and sharing. The proposed solution aims at collecting textual narratives in raw form, processing them to extract and store structured content, and then exposing results through a RESTful interface. We apply it to a corpus related to the time of the European construction in Luxembourg. We disclose details about our conceptual model and implementation, as well as evidence supporting the interest of our approach. 1 INTRODUCTION store the collected stories. The rest of this article is organized as follows. An important aspect of cultural heritage preservation Firstly, related work about narrative structures and is the collection and collation of personal views and knowledge is discussed in Section 2. Then, in Sec- anecdotal stories of citizens. While the wide avail- tion 3, we define a data model appropriate for storing ability of social media will facilitate this work for fu- personal stories and narratives. Next, we describe a ture generations when they analyze our times, such software pipeline to map raw text into the proposed means are not available for e.g. the European con- data model. After describing the French data corpus struction period (roughly 1945-1975). Adapted tools alluded to earlier in the introduction, we disclose a are needed to collect such testimonies by elderly peo- RESTful architecture dedicated to exposing and shar- ple, as well as facilitate their collation and dissemina- ing the data stored in machine-readable format in Sec- tion. tion 5. After showing evidence of the interest of the In this work, we focus on the time of the European consumption of this data by knowledge extraction fa- construction in Luxembourg and the surrounding re- cilities, we conclude with numerous perspectives. gion. This work has been conducted in the context of a funded project in collaboration with elderly peo- ple organizations. In this context, witnesses of the time frame of interest (aged between 75 and 85 years 2 RELATED WORK old) have been interviewed, and their testimonies have been transcribed. In addition, this corpus has been The study of the structure of narratives and stories has enriched by extractions from online platforms - more been applied to a variety of domains, e.g. emergency details about this corpus may be found in Section 5. response (Scherp et al., 2009), situational awareness Eventually, the collected corpus is merely more (Van Hage et al., 2012), or collections of historical than a set of short texts. To make this narrative data documents (Segers et al., 2011). A major concern in actionable, e.g. allow effective indexing, browsing, or this domain is to bridge the gap between raw text (i.e. exploitation by web applications, knowledge extrac- the form under which testimonial stories are generally tion techniques are needed to extract relevant keys to acquired) and structured information with semantic these stories, such as people, places and time frames value, that would enable story linking and advanced involved. Hence in this paper we focus on the means queries. to store, process and access such narrative informa- Associating properties to entities and composing tion. More precisely, a dedicated data model and them is the core concern of ontology engineering. a back-end server are needed in order to model and Well known ontologies include YAGO (Suchanek 48 Bruneau, P., Parisot, O. and Tamisier, T. Storing and Processing Personal Narratives in the Context of Cultural Legacy Preservation. DOI: 10.5220/0006422800480055 In Proceedings of the 6th International Conference on Data Science, Technology and Applications (DATA 2017), pages 48-55 ISBN: 978-989-758-255-4 Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Storing and Processing Personal Narratives in the Context of Cultural Legacy Preservation et al., 2008), the Google Knowledge Graph (Google, crowdsourcing (Bollacker et al., 2008). Wikipedia 2012), and DBpedia (Mendes et al., 2011). These on- page structure has also been exploited in (Suchanek tologies are often used in conjunction with controlled et al., 2008). Alternatively, a term-based heuristic is vocabularies such as Dublin Core (Weibel et al., used in (Gaeta et al., 2014) to determine links between 1998) or FOAF (Brickley and Miller, 2007), that facil- events, and the use of Natural Language Processing itate the interoperability of data sources, most notably (NLP) techniques such as Named Entity Recognition via the RDF representation language. (NER) to automatically extract facts and events has Rather than an ensemble of unrelated facts, nar- been evaluated in (Segers et al., 2011; Van Hooland ration implies relationships between atomic facts or et al., 2015). Entity types in event models such as events. (Scherp et al., 2009) define a taxonomy SEM (Van Hage et al., 2012) are closely related to of such links (compositionality, causality, correlation types extracted by standard NER methods such as and documentation). These links are relevant to our (Favre et al., 2005) (e.g. people, locations, dates). context, but they consider events at a coarse level with no controlled vocabulary of predicates. Similarly to (Van der Meij et al., 2010; Segers et al., 2011), they 3 NARRATIVE ENTITY MODEL are mostly concerned by interoperability between dif- ferent ontologies. Likewise, (Van Hage et al., 2012) To suit the needs of the project described in the intro- emphasize link types, with little consideration of spe- duction, and put people and spatio-temporal coordi- cific data structures for events. With close resem- nates at the center of narratives, we developed a sim- blance to the CIDOC CRM (Doerr, 2003), (Segers plified variant of NKRL (Zarri, 2009). In a nutshell, et al., 2011) define explicitly roles (also known as our model can be thought of as a database schema to facets in (Mulholland et al., 2012)) applying to events enable storage and facilitate indexation of narrative (e.g. actors, dates and locations), which are appro- data. priate for historical events in a broad sense (e.g. the The root object type is denoted as entity, as a ref- French Revolution in (Segers et al., 2011)), but not erence to the Drupal terminology, that supports our for events as constituents of a first-person narrative. implementation of the model, described to further ex- The objective is then to propose a standard metadata tent in Section 5. Except primitive types such as text description space for historical artifacts, rather than and numbers, all non-primitive types (e.g. story) are exploring the structure of narration. specializations of this root object type. Entity labels The contributions by (Zarri, 2009) are the most in Figure 1 are meant to be unique. Entity references, closely related to our work. In their Narrative Knowl- i.e. references to other entities in the database (e.g. edge Representation Language (NKRL), they define person referred to in a story) are underlined. Arrows data structures and controlled vocabularies of predi- denote typed dependencies (i.e. pointers depend on cates and links to support the analysis of non-fictional pointees), when other dependencies may refer to sev- and factual narratives. They avoid the use of the eral kinds of entities. To emphasize the story-centric term story as it has led to ambiguities in the litera- aspect of this model, most types, such as person and ture. Rather, They define a set of events and facts as location, directly store references to stories that refer the fabula. The plot level adds chronological, log- to them. This can be thought of as a kind of reverse ical and coherence links between events. The pre- index. sentation level is about the form in which plots are The data model in Figure 1 is heavily inspired by shown. Some related work in narrative analysis and the model underlying NKRL (Zarri, 2009), but ex- storytelling is concerned with mapping arbitrary sto- hibits decisive distinctions. The proposed structure ries to a classical narrative structure (Tilley, 1992; Ye- was designed with flexibility in mind. For example, ung et al., 2014). In our work, stories are potentially it easily supports partial specification - a typical nar- made of anecdotal testimonies, and as such cannot rative may occasionally omit spatial and or temporal be expected to match these structures. More abstract specifications. Similarly, the proposed custom date properties, such as sentiment attached to stories, were format supports loose specification. The approximate also extracted in (Min and Park, 2016) in order to an- flag indicates whether the precision of the temporal alyze the structure of books. bounds should not be accounted for, and all fields ex- The way arbitrary text is remapped automatically cept year are optional. Both points and intervals in to taxonomies of entity types, relationships and pred- time can be described with the same format, simply icates is seldom considered in the literature. Some by equaling from and to respective fields. authors explicitly assume that this mapping has to be Most entities in the model (e.g. artifacts, people, performed manually (Mulholland et al., 2012), or via places) may have alternative writings. This ambiguity 49 DATA 2017 - 6th International Conference on Data Science, Technology and Applications Figure 1: Simplified Narrative Knowledge Model. Entity types are denoted by boxes. The multiplicity of typed properties is parenthesized. References to entities are underlined, while arrows denote the dependencies implied by references whenever they are explicitly typed.