Automatic Identification of Narrative Diegesis and Point of View
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Identification of Narrative Diegesis and Point of View Joshua D. Eisenberg and Mark A. Finlayson 11200 S.W. 8th Street, ECS Building, Miami, FL 33141 School of Computing and Information Sciences Florida International University jeise003, markaf @fiu.edu { } Abstract limiting the scope of its references are all chal- lenging and important problems that are critical The style of narrative news affects how it to achieving complete automatic understanding of is interpreted and received by readers. Two key stylistic characteristics of narrative text news stories and, indeed, text generally. There has are point of view and diegesis: respectively, been significant progress on some of these prob- whether the narrative recounts events person- lems for certain sorts of texts, for example, recog- ally or impersonally, and whether the narrator nizing implications on short, impersonal, factual text is involved in the events of the story. Although in the long-running Recognizing Textual Entailment central to the interpretation and reception of challenge (RTE1). On the other hand, narrative text news, and of narratives more generally, there (including much news writing) presents additional has been no prior work on automatically iden- complications, in that to accomplish the tasks above tifying these two characteristics in text. We develop automatic classifiers for point of view one must take into account the narrator’s point of and diegesis, and compare the performance of view (i.e., first person or third person), as well as the different feature sets for both. We built a gold- narrator’s personal involvement in the story (a fea- standard corpus where we double-annotated ture that narratologists call diegesis). to substantial agreement (κ > 0.59) 270 En- In news stories specifically writers are encour- glish novels for point of view and diegesis. As aged to use the third person point of view when they might be expected, personal pronouns com- wish to emphasize their objectivity regarding the prise the best features for point of view clas- news they are reporting (Davison, 1983). In opin- sification, achieving an average F1 of 0.928. For diegesis, the best features were personal ion pieces or blog posts, on the other hand, first per- pronouns and the occurrences of first person son is more common and implies a more personal pronouns in the argument of verbs, achieving (and perhaps more subjective) view (Aufderheide, an average F1 of 0.898. We apply the clas- 1997). News writers are also often in the position of sifier to nearly 40,000 news texts across five reporting on events which they themselves have not different corpora comprising multiple genres directly observed, and in these cases can use an un- (including newswire, opinion, blog posts, and scientific press releases), and show that the involved style (known as hetereodiegetic narration) point of view and diegesis correlates largely as to communicate their relative remove from the ac- expected with the nominal genre of the texts. tion. When writers observe or participate in events We release the training data and the classifier directly, however, or are reporting on their own lives for use by the community. (such as in blog posts), they can use an involved nar- rative style (i.e., homodiegetic narration) to empha- 1 Introduction size their personal knowledge and subjective, per- haps biased, orientation. Interpreting a text’s veridicality, correctly identify- ing the implications of its events, and properly de- 1http://aclweb.org/aclwiki/index.php?title=RTE 36 Proceedings of 2nd Workshop on Computing News Storylines, pages 36–46, Austin, TX, November 5, 2016. c 2016 Association for Computational Linguistics Before we can integrate knowledge of point of point of view and diegesis, and discuss their differ- view (POV) or diegesis into text understanding, we ent attributes. In 3 we describe the annotation of § must be able to identify them, but there are no sys- the training and testing corpus, and then in 4 de- § tems which enable automatic classification of these scribe the development of the classifiers. In 5 we § features. In this paper we develop reliable classifiers detail the results of applying the classifiers to the for both POV and diegesis, apply the classifiers to news texts. In 6 we outline related work, and in 7 § § texts drawn from five different news genres, demon- we discuss how shortcomings of the work and how strate the accuracy of the classifiers on these news it might be improved. We summarize the contribu- texts, and show that the POV and diegesis correlates tions in 8. In short, this paper asks the qeustion: § much as expected with the genre. We release the can point of view and deigesis be automatically clas- classifiers and the training data so the field may build sified? The experimental results in this paper show on our work and integrate these features into other that it can be done. text processing systems. Regarding the point of view of the narrator, narra- 2 Definitions tologist Mieke Bal claimed “The different relation- 2.1 Point of View ships of the narrative ‘I’ to the objects of narration are constant within each narrative text. This means The point of view (POV) of a narrative is whether the that one can immediately, already on the first page, narrator describes events in a personal or impersonal see which is the [point of view].” (Bal, 2009, p. manner. There are, in theory, three possible points 29) This assertion inspired the development of the of view, corresponding to grammatical person: first, classifiers presented here: we had annotators mark second, and third person. First person point of view narrative POV and diegesis from the first 60 lines involves a narrator referring to themself, and implies of each of 270 English novels, which is a gener- a direct, personal observation of events. In a third ous simulation of “the first page”. This observa- person narrative, by contrast, the narrator is outside tion allowed us to transform the collection of data the storys course of action, looking in. The narra- for supervised machine learning from an unmanage- tor tells the reader what happens to the characters of able burden (i.e., having annotators read every novel the story without ever referring to the narrator’s own from start to finish) into a tractable task (reading thoughts or feelings. only the first page). We chose novels for training, in- In theory second person POV is also possible, al- stead of news texts themselves, because of the nov- though exceedingly rare. In a second person nar- els’ greater diversity of language and style. rative, the narrator tells the reader what he or she Once we developed reliable classifiers trained and is feeling or doing, giving the impression that the tested with this annotated data, we applied the clas- narrator is speaking specifically to the reader them- sifiers to 39,653 news-related texts across five news selves and perhaps even controlling their actions. genres, including: the Reuter’s corpus containing This is a relatively rare point of view (in our training standard newswire reporting; a corpus of scien- corpus of English novels it occurred only once), and tific press releases scraped from EurekAlerts; the because of this we exclude it from consideration. CSC Islamist Extremist corpus containing ideologi- Knowing the point of view (first or third person) is cal story telling, propaganda, and wartime press re- important for understanding the implied veridicality leases; a selection of opinion and editorial articles as well as the scope of references within the text. scraped from LexisNexis, the Spinn3r web blog cor- Consider the following example: pus, and . We checked a sample of the results, con- (1) John made everyone feel bad. He is a jerk. firming that the classifiers performed highly accu- rately over these genres. The classifiers allowed us With regard to reference, if this is part of a first per- to quickly assess the POV and diegesis of the texts son narrative, the narrator is included in the scope and show how expectations of objectivity or involve- of the pronoun everyone, implying that the narra- ment differ across genres. tor himself has been made to feel bad. In this case The paper proceeds as follows. In 2 we define we might discount the objectivity of the second sen- § 37 tence if we know that the narrator himself feels bad with the beginning of the first chapter. This was on account of John. A third person narrator, by con- done by hand since automating this process was not trast, is excluded from the reference set, one can a trivial task. Then, we automatically trimmed each make no inference about his internal state and, thus, file down to the first 60 lines, as defined by line it does not affect our judgment of the implications of breaks in the original files (which reflect the Guten- the accuracy or objectivity of later statements. berg project’s typesetting). These shortened texts With regard to veridicality, if the narration is third were used by our annotators, and were the data on person, statements of fact can be taken at face value which the classifiers were trained and tested. with a higher default assumption of truthfulness. A We wrote an annotation guide for point of view first person narrator, in contrast, is experiencing the and diegesis, and trained two undergraduate students events not from an external, objective point of view to perform the annotations. The first 20 books from but from a personal point of view, and so assessment the corpus were used to train the annotators, and the of the truth or accuracy of their statements is subject remaining 272 texts were annotated by both annota- to the same questions as a second-hand report.