Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks Canasai Kruengkrai, Kentaro Torisawa, Chikara Hashimoto, Julien Kloetzer, Jong-Hoon Oh, Masahiro Tanaka National Institute of Information and Communications Technology, Kyoto, 619-0289, Japan {canasai, torisawa, ch, julien, rovellia, mtnk}@nict.go.jp Abstract keizaikankyou-ga akkasuru → shijoukinri-ga gerakusuru (“economic environment deteriorates” → “markets decline”) We propose a method for recognizing such event causalities as “smoke cigarettes” → “die of lung cancer” using back- bukka-wa gerakusuru → defure-ni naru ground knowledge taken from web texts as well as original (“prices of commodities decline” → “become deflation”) sentences from which candidates for the causalities were ex- kubi-ni naru → sitsugyouhoken-o morau tracted. We retrieve texts related to our event causality candi- → dates from four billion web pages by three distinct methods, (“get fired” “get unemployment insurance”) including a why-question answering system, and feed them chiyuryoku-o takameru → atopii-o kokufukusu to our multi-column convolutional neural networks. This al- (“enhance healing power” → “defeat atopic syndrome”) lows us to identify the useful background knowledge scat- tered in web texts and effectively exploit the identified knowl- bitamin-ga fusokusuru → koukakuen-ni naru edge to recognize event causalities. We empirically show that (“lack vitamin” → “cause angular cheilitis”) the combination of our neural network architecture and background knowledge significantly improves average precision, while the previous state-of-the-art method gains just a small Table 1: Japanese examples of event causalities successfully benefit from such background knowledge. recognized by our proposed method 1 Introduction This work develops a method that recognizes event → Event causality, such as “smoke cigarettes” “die of lung causalities from sentences regardless whether they have cancer,” is critical knowledge for many NLP applications, such explicit clues as “because.” The novelty of this work including machine reading and comprehension (Richard- lies in that we exploit a wide range of background knowledge son, Burges, and Renshaw 2013; Berant et al. 2014), pro- (written in web texts) using convolutional neural networks. cess extraction (Scaria et al. 2013), and future event/scenario In other words, given an event causality candidate, our neu- prediction (Radinsky, Davidovich, and Markovitch 2012; ral network analyzes descriptions in web texts that are some- Hashimoto et al. 2014). However, the state-of-the-art meth- how related to the given causality candidate and judges ods for event causality recognition still suffer from low pre- whether the candidate is a proper causality. The web texts cision and coverage because event causality is expressed in are retrieved by three distinct methods from our 4-billion- a wide range of forms that often lack explicit clues indicat- page web archive. We target such event causalities as “global ing the existence of event causality. Consider the following warming worsens” → “typhoons strengthen,” in which each sentences: cause phrase (“global warming worsens”) and effect phrase 1. Typhoons have strengthened because global warming has (“typhoons strengthen”) consists of a noun (“global warm- worsened. ing”) and a verb (“worsens”). Our experimental results 2. Global warming worsened, and typhoons strengthened. showed that our neural network-based method outperforms the state-of-the-art method based on SVMs (Hashimoto et The first sentence includes “because,” which explic- al. 2014) and its variants augmented with the background itly indicates the event causality between effect “typhoons knowledge sources introduced in this work. This suggests strengthen” and cause “global warming worsens.” On the that our neural network architecture is more suitable for other hand, the second sentence has no such clues. Nonethe- dealing with background knowledge. Table 1 shows ex- less, many people would infer that this sentence expresses amples of event causalities successfully recognized by our the same event causality as that in the first sentence. This is method but not by Hashimoto et al.’s method. possible because people have background knowledge about As one method to extract background knowledge from “typhoons” and “global warming.” web archives, we use a why-question answering (why-QA) Copyright c 2017, Association for the Advancement of Artificial system following (Oh et al. 2013) that retrieves seven- Intelligence (www.aaai.org). All rights reserved. sentence passages as answers to a given why-type ques- Softmax ist, they are encoded in the features of their SVM classifier. Note that the set of short binary patterns they prepared includes not only CAUSE (“A causes B”) relations but also Pooling Pooling Pooling Pooling Pooling such relations as MATERIAL (“A is made of B”) and USE . A B Convolution Convolution Convolution Convolution Convolution (“ is used for ”), which are not directly related to causal- Column 1 Column 2 Column 5 Column 6 Column 8 ity. They showed that such patterns actually improved the performance, suggesting that a wide range of texts can be effective clues for judging causality. Causality candidate Context "smoke cigarettes" "He <CAUSE> too Patterns extracted from related texts, We extend Hashimoto et al.’s method by introducing MC- "die of lung cancer" often and <EFFECT>." which are retrieved from the web by NNs to effectively use a wider range of background knowl- three distinct methods edge expressed in a wider range of web texts. By “back- Original sentence Web ground knowledge,” we refer to any type of information that "He smoked cigarettes too often and died of lung cancer." is useful to generally recognize event causalities. Our as- sumption is that a wide range of dependency paths between Figure 1: Our MCNN architecture nouns can be used as background knowledge, not just the short binary patterns that Hashimoto et al. proposed to exploit. In this work, we did not start from pre-specified pat- tion. We automatically generate a question from the ef- terns, but designed our method so that it can automatically fect part of an event causality candidate and extract back- learn/identify a wide range of paths as background knowl- ground knowledge from its answers. For recognizing the edge in extra texts, such as why-QA system’s answers. Our event causality candidate, “global warming worsens” → “ty- method is given those paths even if they are long and com- phoons strengthen” as a proper causality, our method gen- plex. We also even extend the notion of dependency paths erates the question, “Why do typhoons strengthen?”, and to inter-sentential ones so that they can capture the inter- retrieves such answers as one that includes the following sentential relations between two nouns that appear in con- two sentences: “Typhoons are becoming more powerful than secutive sentences. before. This is probably an effect of recent global warm- In addition to the above why-QA system’s answers, we ing.” Useful background knowledge here might be the causal tried the following two types of texts as sources of back- relation between “Typhoons are becoming more powerful” ground knowledge: and “global warming,” which somehow resembles the event A. A wider range of short binary patterns than those used in causality candidate. A problem is that the why-QA system’s Hashimoto et al. (2014). Contrary to their work, we did answers have a wide variety of forms, and the recognition not pose any semantic restrictions on the patterns. of such relations and their similarity to our target causality B. One or two (consecutive) sentences that include such clue candidates is not a trivial task. Furthermore, not all of the terms for causality as “reason” and “because” and the two answers necessarily express any useful background knowl- nouns in a target event causality, like “Recently powerful edge. Even proper answers to a given question may not pro- typhoons have been reported. One reason is global warm- vide any useful knowledge (e.g., “The paper reported that ing.” Note that we do not use any sophisticated mech- typhoons are becoming powerful due to high sea tempera- anism such as a why-QA system to retrieve these sen- ture. Another paper addressed the issues caused by global tences. We just retrieve all the texts including the clue warming.”). Extracting useful knowledge from such noisy terms and the nouns. texts itself is challenging. To perform this difficult task, we use multi-column con- Although our target language is Japanese, we believe that volutional neural networks (MCNNs) (Ciresan, Meier, and our method is extendable to other languages without much Schmidhuber 2012), which are a variant of convolutional cost. Note that we use English examples for the sake of read- neural networks (Lecun et al. 1998) with several indepen- ability throughout this paper. dent columns. Each column has its own convolutional and pooling layers, and the outputs of all the columns are com- 2 Related work bined in the last layer to provide a final prediction (Fig. 1). For event causality recognition, researchers have exploited In our architecture, we use five columns to process the event various clues, including discourse connectives and word causality candidates and their surrounding contexts in the sharing between cause and effect (Torisawa 2006; Abe, Inui, original sentence and three other columns to deal with web and Matsumoto 2008; Riaz and Girju 2010). Do, Chan, and texts retrieved by different methods. Roth (2011) proposed cause-effect association (CEA) statis- Our work was inspired by the research of Hashimoto et tics. Radinsky, Davidovich, and Markovitch (2012) used ex- al. (2014). They improved the performance of event causal- isting ontologies, such as YAGO (Suchanek, Kasneci, and ity recognition using a carefully selected set of short binary Weikum 2007). Hashimoto et al. (2012) introduced a new se- patterns that connect pairs of nouns, like “A causes B” and mantic orientation of predicates, and Hashimoto et al.

Load more