Annotating High-Level Structures of Short Stories and Personal Anecdotes

Boyang Li1, Beth Cardier2, Tong Wang3, and Florian Metze4 1 Disney Research, Pittsburgh, PA 2 Sirius-Beta, Virginia Beach VA. 3 University of Massachusetts, Boston Boston, MA. 4 Carnegie Mellon University, Pittsburgh, PA [email protected], [email protected], [email protected], [email protected]

Abstract Stories are a vital form of communication in human culture; they are employed daily to persuade, to elicit sympathy, or to convey a message. Computational understanding of human narratives, especially high-level narrative structures, however, remain limited to date. Multiple literary theories for narrative structures exist, but operationalization of the theories has remained a challenge. We developed an annotation scheme by consolidating and extending existing narratological theories, including Labov and Waletsky’s (1967) functional categorization scheme and Freytag’s (1863) pyramid of dramatic tension, present 360 annotated short stories collected from online sources. In the future, this research will support an approach that enables systems to intelligently sustain complex communications with humans.

Keywords: narrative structure, dramatic arc, story understanding

1. Introduction Story is a fundamental form of human communication, sometimes argued to be more powerful than logical arguments (Bruner 1986; Fisher 1987). Stories can be used, for example, to persuade, to encourage, to elicit sympathy, and convey a moral, message, value or lesson. It follows that a computational understanding of stories will help Figure 1. Freytag’s story structural functions. The line computer systems communicate better with users. Recent indicates how dramatic tension heightens and lowers years have witnessed growing interests in computational throughout the story. approaches for story understanding (Bamman et al. 2013; Ferraro and Van Durme 2016; Finlayson, M. A. 2016; Goyal et al. 2010; Ouyang and McKeown 2015; Pichotta another structure that starts with Abstract and Orientation, and Mooney 2016). Yet few attempts to understand high- goes through Complicating Actions, The Most Reportable level story structure. Event, Evaluation, to end with Resolution and Coda. In our What constitute story structure or story arc may be opinion, structural functions proposed by two theories both debatable since there is more than one facet to a story. As satisfy the four requirements laid out earlier. an operating definition, we consider a story structure to Although these theories seem reasonable by themselves, satisfy the following requirements: (1) it contains a small two important open questions remain: (1) Are these set of functions with typical orderings between them, theories compatible or mutually exclusive? If they are though atypical orderings are possible. (2) The functions compatible, do they describe the same story stages using are independent of content and genre; they describe different terms? (2) Can computational systems understand structures of stories with different content in any genre. (3) stories in terms of these categories? The functions carry significance on the dramatic arc, and (4) together they describe most of a story rather than a small In this paper, we attempt to answer these questions. First, part of it. we identify similar concepts that are described by both theories. Based on this understanding, we develop a new This definition rules out the functions proposed by Propp annotation scheme, which reconciles the two theories and (1928) for Russian folklores, components of the hero’s provides additional functions that we find useful in journey (Campbell 1949), and the like, because they are annotating casual stories online. Finally, we trained closely tied to one type or genre of stories and are not annotators to label sentences in stories acquired from domain-independent. Event-level representations, such as online sources and public datasets, yielding 360 unique plot units (Lehnert 1981), also do not fit the definition annotated stories. because not all events play important dramatic roles and the ordering between events can be rather arbitrary. To our best knowledge, this is the first effort aiming at creating an operational annotation scheme that unifies Instead, we investigate what was described by Aristotle different accounts of story macro-structures. Previous (335 BC) as the beginning, the middle and the end of a annotation schemes either focus on event-level story. Similar to Aristotle, Freytag (1863) proposed a representations that do not always have dramatic dramatic structure containing five parts, whose modern significance (Elson 2012; Lehnert 1981), or only part of the version includes Exposition, Rising Action, Climax, dramatic curve (Ouyang and McKeown 2015). Falling Action, and Dénouement. The parts are correlated with the rising and falling of dramatic tension, as illustrated in Figure 1. Labov and Waletzky’s theory on narrative analysis (1967; Labov 2013) (henceforth L&W) provides Freytag L&W Prince Todorov Our Annotation Exposition Orientation Starting State Old Equilibrium Orientation Complicating Complicating Rising Action Disruption Actions Actions Most Reportable State-changing Most Reportable Climax Efforts to repair the Event Event Event disruption Falling Action Resolution (Minor) Resolution Ending State Dénouement Coda New Equilibrium Aftermath Table 1. Correspondence between categories from different narrative theories and our annotation.

story structure and measured similarity between 2. Related Work trajectories. There have been several attempts at annotating story semantics and computationally predicting semantic labels To situate ourselves with regards to previous work, this from text. Ouyang and McKeown (2015) (henceforth work adopts the macro-structural view by L&W and O&M) identified the most reportable event (MRE) in L&W Freytag, instead of the event-based or the bottom-up view as the “nucleus” of the story. Consequently, they annotated of narrative structure. We propose a new functional schema the MRE in roughly 500 stories collected from Reddit and that reconciles Freytag’s theory with L&W, which we built a classifier for identifying MREs from text, but believe capture both dramatic tension and the social aspect omitted other categories from the theory. Rahimtoroghi et of online narratives. al. (2013) and Swanson et al. (2014) used subset of categories from L&W, including orientation, action, and 3. The Annotation Schema evaluation. In this section, we start by discussing several narrative theories, including Freytag (1863), L&W (1967; Labov At the event level, Elson (2012) designed an annotation 2013, 1997), Prince (1973), Todorov (1971), and O&M schema, Story Intention Graph (SIG), that captures timelines as well as beliefs, intentions and plans of story (2015). After that, we propose a new set of functional labels characters. We perceive similarities between this that capture fundamental ideas from these theories. annotation and approaches for generating stories and 3.1 Integrating Narrative Theories character behaviors, such as Belief-Desire-Intention agents Freytag’s five-stage theory of story development in theater (Rao and Georgeff 1995) and intention-based story (1863) mainly concerns the amount of dramatic tension. planning (Riedl and Young 2010). Lukin et al. (2016) Modern interpretations of the theory (e.g., Thursby 2017) annotated 108 personal stories using the SIG formalism. generalize stories across numerous genres and media. In the Finlayson (2016) produced extensive annotations for first stage, Exposition, introduces the narrative setting and Propp’s Russian folklores, ranging from co-reference and has the lowest tension. Tension then increases during a temporal ordering to semantic roles and word senses. process referred to as Rising Action, propelled by a crisis. Gervás et al. (2016) found Propp’s functions limited to a Freytag's tension peaks at the Climax, where the forces of single genre and created a new set of functions for tension are concentrated. After the climax, we draw from annotating 42 musicals. In contrast to L&W’s theory, we Thursby’s (2006) modern interpretation, where tension consider these annotations to be on the micro-structure of quickly falls towards a Resolution and then Dénouement.1 events rather than the macro-structure of the entire In both professional and everyday instances of modern narrative. narratives, we have observed a swift drop in tension during Sitting in the middle of the macro-micro spectrum, Lehnert this last stage, its fast resolution standing in contrast to the (1981) proposed plot units as a high-level summarization labor by which tension was built. schema. An event is classified according to its sentiment as This pyramid structure is reminiscent of Todorov’s analysis positive, negative, or a mental state with neutral sentiment. (1971) in which a story starts with an equilibrium, which is She further argued that graphs containing the events and later disrupted. Efforts to restore the equilibrium are made four types of causal links in-between can capture important and the new equilibrium is created in the end. In our narrative structures. Appling and Rield (2009) and Goyal interpretation, an equilibrium state has low tension. The et al. (2010) trained machine learning models to predict disruption leads to high tension and the restoration of plot unit structures. equilibrium lowers the tension. A bottom-up approach employs statistics from local In comparison, L&W’s story structure focuses on the social regions of text to represent story structure, instead of a relationship between the storyteller and the audience and predefined set of function labels. Elsner (2015) collected on the surface shares little with Freytag and Todorov. The frequency trajectories of character names combined with theory contains the following categories: Abstract, words expressing emotions and appearing in Latent Orientation, Complicating Actions, The Most Reportable Dirichlet Allocation (Blei et al. 2003) topics to represent Event, Evaluation, Resolution and Coda. Labov argued that the entire story’s purpose is to serve the MRE, which is “the event that is less common than any other in the narrative

1 In its second half, Freytag’s framework closely follows the nuances of tragic theatre, featuring several complicated turns of action that are difficult to generalize to other genres. and has the greatest effect upon the needs and desires of the introduce the inciting action and thus sits outside the story participants in the narrative (is evaluated most strongly)” frame. This label can also apply to a story title. (Labov 1997, p. 406). Orientation: This is the starting state of the story and thus, Ouyang and McKeown (2015) went a step further by like the other stages that successively follow, it sits within merging L&W with Prince’s (1973) three basic states: the the story frame. The orientation consists of a survey of the starting state, the ending state, and the transformational elements that set up the central action, which include “time, event in the middle. Hence, they provided a slightly place, persons and their activity or situation’ (Labov, 1972, modified definition for MRE as “the most unusual event p. 364). It may also include general tendencies of a person that has the greatest emotional impact on the narrator and or situation, such as “my brother is usually very healthy” or the audience”. O&M further note the Orientation is the “my house is always cold”. starting state and the Resolution is the ending state. Complicating Action: In general, a complicating action is However, we have not yet found a correspondence for a single event that increases the tension of the story. It is Rising and Falling Actions in L&W’s framework. In also causes a situation to turn away from normal and Labov’s scheme (1997, 2013) the Complicating Action is become remarkable. Finally, it has a causal component, in any event in a causal sequence, of which the MRE is one. that it propels the critical action of the story towards the Here we apply an additional requirement that a MRE. We use this label multiple times to indicate a series complicating action must cause the tension to rise, and of complicating actions that build tension with each must make the MRE causally possible. That is, it must occurrence. cause something to become more complicated, as the name The Most Reportable Event: This is an event that implies. introduces tension, in the same manner as a complicating We further deviate from O&M by aligning Falling Action action, but it also has some unique qualities that means with Resolution and Dénouement with Coda. Labov there can only be one in a tale. A sentence or sentences defines Coda in terms of its ability to resolve all further qualify as an MRE if two criteria are fulfilled: (1) it is an questions the audience may have, “so that the question: explicit event at the highest tension point of the story. (2) If ‘What happened then?’ is no longer appropriate” (Labov you only report one event as the summary of the story, it is 1997, p. 402). That is, a new equilibrium has been this one. established. Table 1 shows the correspondence between Minor Resolution: This is an explicit event that allows different narrative theories. tension to drop slightly during a series of complicating L&W’s Abstract and Evaluation do not have corresponding actions. It can occur in two ways: (1) by resolving a lesser functions in Freytag and others. We attribute this to mystery in a story, or part of it; (2) by resolving the tension difference in subject matter – L&W focused on oral stories, of part of a problem in the story, without resolving the which are usually short and less formal than professional issues of the entire narrative. productions; the relationship between the storyteller and Return of MRE: When the MRE theme comes back later the listener is usually close. L&W’s Abstract draws after the resolution in a new way, either in time or in action, attention from the listener and signals the following story. An example is when a friend calls and says, “I just had the we say it is a ‘Return of MRE’. This event is a new twist on the main theme. It must be at similar level of tension and most amazing experience at the park!” Evaluation usually importance as the MRE; it is also separated from the MRE provides a personal viewpoint from the storyteller, such as “That’s why I avoid that restaurant.” This type of message by time or other narrative functions (if not, it is simply the same event as the MRE). The Return of MRE allows the is rare in formal productions, except perhaps for children’s tension to rise again after the Resolution. stories and fables. 3.2 The Annotation Schema Resolution: This event on the main causal chain happens after the MRE and resolves the dramatic tension of the Based on the insight gained from the theoretical analysis, story. Hence, it is often a concluding action of the story, but we present an operationalized theory in terms of narrative can be followed by the Aftermath or the Evaluation. functions that we use to label stories. A key practical consideration is to reduce ambiguity in the definitions, so Aftermath: This event occurs when a significant temporal the schema can be easily communicated and the number of gap has elapsed after the main event sequence has ways that a story may be annotated is reduced. Here we concluded. It indicates the long-term effect or broader describe the 10 functional labels. implications of the recounted events – for example, how the story characters went on with their lives after the main A central idea throughout these 10 categories is the “story events are over. frame”. Events recounted as part of the story are within the frame. The narrator can also step outside the frame to Evaluation: This is a comment from the narrator about the reflect on the tale’s meaning or connect with the audience. significance or meaning of the story itself and is focused on Labov also observed two modes of engagement – one a moral, message, value or lesson. It could even be the socially-oriented and another in which the speaker is absence of a lesson, such as “I didn’t learn X”. The reader “reliving events of his past” (Labov, 1972, p. 354). stops recounting the process of events and “turn[s] to the reader and tell[s] him what the point is” (Labov, 1972, p. Abstract: An abstract is a summarizing account of the key 374). It aligns with Labov’s notion of ‘external evaluation’ ideas in the tale, and is almost always found at the (Labov, 1972, p. 371). This kind of comment usually beginning of the text. Although it contains information happens after resolution or aftermath and thus occurs about the story (including the gist of the MRE), it does not outside the story frame. respectively, indicating fair agreement among the annotators. We further analyze the disagreements made by the annotators. Figure 1 shows a detailed confusion matrix among narrative functions and an additional “unlabeled” category. The numbers in the matrix are computed as follows. If annotator A and annotator B agree that a sentence is in category 푖 , the count for the cell (푖, 푖) , denoted as 푐푖푖 , is incremented by 1. If one labels the sentence as category 푖 and the other labels it as category 푗, the count for cells (푗, 푖) and (푖, 푗) are both increased by 0.5. Finally, the cells are normalized as 2푐푖푗/(∑푘 푐푖푘 + ∑푘 푐푘푗). From Figure 1, we observe that substantial agreement is achieved around the major categories of story structure that appear in all three schemes (ours, Labov’s and that of Freytag/Thursby). These are the Orientation, Complicating Action, and MRE. The three categories close to the end of the story, Resolution, Evaluation, and Aftermath, tended to Figure 1. The confusion matrix for narrative functions. be mixed up by annotators. Early elements such as Abstract and Orientation also tended to get confused. This suggests

Direct comment to audience: A direct comment openly our annotation scheme is able to differentiate major addresses the audience outside the story frame, for components of the story structure, even though the example: “You’re not going to believe this.” It can also annotation gets less accurate on the categories that are more include the reason for telling the story, an apology for the specific to particular nuances of story structure. Infrequent way the story is presented, or concern that telling the story categories such as Return of MRE and Minor Resolution will get the writer into trouble. are difficult to annotate. After merging Resolution, Evaluation, and Aftermath into a single category, and 4. The Annotation Procedure treating Minor Resolution and Return of MRE as unlabeled, the three interrater agreement measures increase We selected stories from three sources: stories collected to 0.44, 0.49, and 0.47, respectively. We note the from Quora by Wang et al. (2017), stories collected from challenges in analyzing the structures of real stories and Reddit by O&W, and stories annotated by Lukin et al. consider it an achievement that we were able to (2016). For the Quora stories, two annotators determined if differentiate major categories across entire narratives. a text is a story and fits our purpose. The criteria include: More intensive training procedures for the annotators, the text must contain an MRE and is composed of only one along with a simplification of some categories of the story (not multiple); the text is shorter than 700 words, scheme, are likely to improve interrater agreement further. longer than 90 words and has less than 6 lines of dialogue; non-narrative elements, if any, must be less than 50% of the 6. Conclusions text. Stories that do not meet these criteria are rejected. Stories that contain offensive content are not annotated. Understanding the macro structures of a narrative, such as where dramatic tension rises and falls, is an important link The annotation process is as follows. The annotators, who in enabling computer systems to understand narrative. do not have backgrounds in linguistics or literature, went Existing work tends to focus on categories that are specific through three rounds of tutorials over five weeks, during to one genre and types of stories or a subset of the story which they were given 25 stories to annotate and then structure. compared their annotations with the gold standard provided by the authors. After that, the two annotators annotated the In this paper, we provide a first attempt at integrating same 71 stories in order to compute interrater agreement. multiple narratological accounts to capture more Subsequently, they annotated separate stories. One author fundamental structure. To do this, we propose a set of of this paper also annotated a small number of stories. narrative functions that capture dramatic tension and the social aspect of stories, as proposed by Labov and In total, 480 stories containing 8,908 sentences were Waletzky (1967). We annotated 360 unique stories from annotated. Excluding repeated stories, we obtain 360 three story sources in the literature. This achieved fair unique stories, including 167 from the Quora dataset, 73 interrater agreement on the annotations. Upon detailed from Lukin et al. and 120 from O&W. A story contains inspection, we note confusion in the annotations are 18.34 sentences on average. concentrated on a few fine-grained categories. The annotation results suggest the annotation scheme allows the 5. Validation and Discussion separation of major structural elements, despite the We computed interrater agreement using Cohen’s Kappa difficulty of the task. We believe this research will lead to between pairs of annotators separately, as not all annotators further progress towards an artificial intelligence that can worked on the same set of stories. The agreement is communicate with human users in the form of stories. computed at a sentence level. Among the two annotators and an author, the pair-wise kappa are 0.39, 0.41, and 0.42 7. Bibliographical References Proceedings of the 30th AAAI Conference on Artificial Appling, D. S., & Riedl, M. O. (2009). Representations for Intelligence. Learning to Summarize Plots. In Proceedings of the Prince, G. (1973). A Grammar of Stories: An Introduction. AAAI Spring Symposium on Intelligent Narrative Propp, V. Y. (1928). Morphology of the Folktale. Technologies II. Rahimtoroghi, E., Swanson, R., Walker, M. A., & Aristotle. (335AD). Poetics. Corcoran, T. (2013). Evaluation, Orientation, and Action Bamman, D., O’Connor, B., & Smith, N. A. (2013). in Interactive StoryTelling. In Proceedings of Intelligent Learning latent personas of film characters. In Narrative Technologies 6. Proceedings of the 51st Annual Meeting of the Rao, A. S., & Georgeff, M. P. (1995). BDI-agents: From Association for Computational Linguistics. Theory to Practice. In Proceedings of the First Blei, D., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet International Conference on Multiagent Systems. Allocation. Journal of Machine Learning Research, 3. Riedl, M. O., & Young, R. M. (2010). Narrative planning: Bruner, J. (1986). Actual minds, possible worlds. Balancing plot and character. Journal of Artificial Cambridge, MA: Harvard University Press. Intelligence Research. Campbell, J. (1949). The Hero with a Thousand Faces. Swanson, R., Rahimtoroghi, E., Corcoran, T., & Walker, Bollingen Foundation. M. (2014). Identifying Narrative Clause Types in Elson, D. K. (2012). DramaBank: Annotating Agency in Personal Stories. Presented at the Annual SIGdial Narrative Discourse. In Proceedings of the International Meeting on Discourse and Dialogue. Conference on Language Resources and Evaluation. Thursby, Jacqueline. (2006). Story: A Handbook. Ferraro, F., & Van Durme, B. (2016). A unified Bayesian Greenwood Publishing Group. model of scripts, frames and language. In Proceedings of Todorov, T. (1971). The Two Principles of Narrative. the 30th AAAI Conference on Artificial Intelligence. Diacritics, 1(1), 37–44. Finlayson, M. A. (2016). Inferring Propp’s Functions from Wang, T., Chen, P., & Li, B. (2017). Predicting the Quality Semantically Annotated Text. Journal of American of Short Narratives from Social Media. In The 26th Folklore, 129(511), 53–75. International Joint Conference on Artificial Intelligence. Fisher, W. (1987). Human communication as narration: Toward a philosophy of reason, value and action. Columbia, SC: University of South Carolina Press. Freytag, G. (1863). Die Technik des Dramas. Gervás, P., Hervás, R., León, C., & Gale, C. V. (2016). Annotating Musical Theatre Plots on Narrative Structure and Emotional Content. In Proceedings of the Seventh International Workshop on Computational Models of Narrative. Goyal, A., Riloff, E., & Daume III, H. (2010). Automatically Producing Plot Unit Representations for Narrative Text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Labov, W. (1997). Some further steps in narrative analysis. Journal of narrative and life history, (7), 395–415. Labov, W. (2013). The Language of Life and Death. Cambridge University Press. Labov, W., & Waletzky, J. (1967). Narrative analysis. In Essays on the Verbal and Visual Arts. Seattle, WA: University of Washington Press. Labov, William. (1972). The Transformation of Experience in Narrative Syntax. In Language in the Inner City: Studies in the Black English Vernacular (pp. 354–397). Philadelphia: University of Philadelphia Press. Lehnert, W. (1981). Plot Units and Narrative Summarization. Cognitive Science, 4, 293–331. Lukin, S. M., Bowden, K., Barackman, C., & Walker, M. A. (2016). PersonaBank: A Corpus of Personal Narratives and their Story Intention Graphs. In Proceedings of the International Conference on Language Resources and Evaluation. Elsner, M. (2015). Abstract representations of plot structure. Linguistic Issues in Language Technology, 12. Ouyang, J., & McKeown, K. (2015). Modeling reportable events as turning points in narrative. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Pichotta, K., & Mooney, R. J. (2016). Learning statistical scripts with LSTM recurrent neural networks. In