Causal Inference of Script Knowledge

Causal Inference of Script Knowledge Noah Weber 1, Rachel Rudinger 2,3, Benjamin Van Durme 1 1Johns Hopkins University 2University of Maryland, College Park 3Allen Institute for AI Abstract When does a sequence of events define an everyday scenario and how can this knowledge be induced from text? Prior works in inducing such scripts have relied on, in one form or another, measures of correlation between instances of events in a corpus. We argue from both a conceptual and practical sense that a purely correlation-based approach is insuf- ficient, and instead propose an approach to script induction based on the causal effect be- Figure 1: The events of Watching a sad movie, Eat- tween events, formally defined via intervening popcorn, and Crying, may highly co-occur in a hy- tions. Through both human and automatic pothetical corpus. What distinguishes valid event pair evaluations, we show that the output of our inferences (event pairs linked in a commensense sce- method based on causal effects better matches nario; noted by checkmarks above) versus invalid infer- the intuition of what a script represents. ences (noted by a ‘X’)? 1 Introduction Commonsense knowledge of everyday situations, Hanks, 1990) between event mentions. Others em- as defined in terms of prototypical sequences of ployed probabilities from a language model over events, has long been held to play a major role in event sequences (Jans et al., 2012; Rudinger et al., text comprehension and understanding (Minsky, 2015; Pichotta and Mooney, 2016; Peng and Roth, 1974; Schank and Abelson, 1975, 1977; Bower 2016; Weber et al., 2018b), or other measures of et al., 1979; Abbott et al., 1985). Naturally, this has event co-occurrence (Balasubramanian et al., 2013; motivated a large body of work looking to learn Modi and Titov, 2014). 1 such knowledge, such scripts, from text corpora In this work we ask: do measures rooted in co- through data-driven approaches. occurrence best capture the notion of whether one A minimal and often implicit requirement for event should follow another in a script? We posit any such approach is to resolve for any pair of that it does not, that while observed correlations events e1 and e2 what quantitative measure should between events indicate relatedness, relatedness is be used to determine whether e2 should ”follow” e1 not the only factor in determining whether events in script. That is, documents may serve as descrip- form a meaningful script. tions of events that occur in the same situation as Consider the example of Ge et al.(2016): hur- other events: what function may we compute over ricane events are prototypically connected with the raw presence or absence of events in documents events of donations coming in. Likewise, hurri- that is most useful for script induction? cane events are connected to evacuation events. Chambers and Jurafsky (2008; 2009) adopted However, while donation and evacuation events are point-wise mutual information (PMI) (Church and not conceptually connected in the same sense, there 1For simplicity we will refer to these ‘prototypical event will exist strong statistical associations between the sequences’ as scripts throughout the paper, though it should be noted scripts as originally proposed contain further structure two. Figure1 provides a second example: eating not captured in this definition. popcorn is not conceptually associated with crying, 7583 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 7583–7596, November 16–20, 2020. c 2020 Association for Computational Linguistics but they might co-occur in a hypothetical corpus 1985), as do related works in psychology (Black describing situations of watching a sad movie. and Bower, 1980; Trabasso and Sperry, 1985). What do strict co-occurrence measures miss? In But any measure based solely on p(e2je1) is ag- both examples the ‘invalid’ inferences arise from nostic to notions of causal relevance. Does this the same issue: an event such as eating popcorn matter in practice? A high p(e2je1) indicates ei- may raise the probability of the event crying, but ther: (1) a causal influence of e1 on e2, or (2) a it does so only through a shared association with a common cause e0 between them, meaning the re- movie watching context: the increase in probability lation between e1 and e2 is spurious. In the latter is not due to the eating popcorn itself. In other case, e0 acts as a confounder between e1 and e2. words, what is lacking is a direct causal effect be- Ge et al.(2016) acknowledges that the associ- tween these events, a quantity that can be formally ations picked up by correlational measures may defined using tools from the causal inference litera- often be spurious. Their solution relies on using ture (Hernan and Robins, 2019). trends of words in a temporal stream of newswire In this work we demonstrate how a measure data, and hence is fairly domain specific. based on causal effects can be derived, com- puted, and employed for the extraction of script 2.2 Defining Causal Relevance knowledge. Using crowdsourced human eval- Early works such as Schank and Abelson(1975) uations and a variant of the automatic cloze are vague with respect to the meaning of “causally evaluation, we show how this definition better chained.” Can one say that watching a movie has captures the notion of scripts as compared to causal influence on the subsequent event of eating prior standard measures, PMI and event sequence popcorn happening? Furthermore, can this defini- language models. Code and data available at tion be operationalized in practice? github.com/weberna/causalchains. We argue that both of these questions may be 2 Motivation elucidated by taking a manipulation-based view of causation. Roughly speaking, this view holds Does that fact that event e2 is often observed after that a causal relationship is one that is “potentially e1 in the data (i.e. p(e2je1) is “high”) mean that e2 exploitable for the purposes of manipulation and prototypically follows e1, in the sense of being part control”– Woodward(2005). In other words, a of a script? In this section we argue that observed causal relationship between A and B means that (in associations are not sufficient for the purpose of some cases) manipulating the value of A should extracting script knowledge from text. We argue result in a change in the value of B. A primary from a conceptual standpoint that some notion of benefit of this view is that the meaning of a causal causal relevance is required. We then give exam- claim can be clarified by specifying what these ples showing the practical pitfalls that may arise ‘manipulations’ are exactly. We take this approach from ignoring this component. Finally, we propose below to clarify what exactly is meant by ‘causal our intervention based definition for script events, relevance’ between script events. and show how it both explicitly defines a notion of Imagine an agent reading a discourse. After read- ‘causal relevance,’ while simultaneously fixing the ing a part of the discourse, the agent has some ex- aforementioned practical pitfalls. pectations for events that might happen next. Now imagine that, before the agents reads the next pas- 2.1 The Significance of Causal Relevance sage, we surreptitiously replace it with an alternate The original works defining scripts are unequivocal passage in which the event e1 happens. We then al- about the importance of causal linkage between low the agent to continue reading. If e1 is causally 2 script events, and other components of the origi- relevant to e2, then this replacement should, in nal script definition (e.g. what-ifs, preconditions, some contexts, raise the agents degree of belief in postconditions, etc.) are arguably causal in na- e2 happening next (contra the case where we didn’t ture. Early rule-based works on inducing scripts intervene to make e1 happen ). heavily used causal concepts in their schema rep- So, for example, if we replaced a passage such resentations (DeJong, 1983; Mooney and DeJong, that e1 = watching a movie was true, we could 2“...a script is not a simple list of events but rather a linked expect on average that the agent’s degree of belief causal chain”(Schank and Abelson, 1975) that e2 = eating popcorn happens next will be 7584 tween variables graphically in a manner similar to Bayesian networks; the key distinction being T T ... i-1 i-1 that the edges in a CBN posits a direction of causal influence between the variables 3. U e e i-1 i ... We will define our causal model from a top down, data generating perspective in a way that aligns Di-1 Di Di+1 with our conceptual story from the previous section. Below we describe the four types of variables in our model, as well as their causal dependencies. Figure 2: The diagram for our causal model up to time step i. Intervening on ei−1 acts to remove the dotted The World, U: The starting point for the gener- edges. See 3.1 for a description of the variables. ation of our data is the real world. This context is explicitly represented by the unmeasured variable U. This variable is unknowable and in general un- higher. In this way, we say these events are causally measurable: we don’t know how it is distributed, relevant, and are for our purposes, script events. nor even what ‘type’ of variable it is. This variable With this little ‘story,’ we have clarified the con- is represented by the hexagonal node in Figure2. ceptual notion of causal relevance in our problem. In the next section, we formalize this story and its The Text, T: The next type of variable represents notion of intervention into a causal model.

Load more