And the Causality Detection Benchmark

1 Persian Causality Corpus (PerCause) and the Causality Detection Benchmark Zeinab Rahimi, Mehrnoush ShamsFard*1 NLP Research Lab., Shahid Beheshti University, Tehran, Iran {z_rahimi, m-shams}@sbu.ac.ir ABSTRACT. Recognizing causal elements and causal relations in text is one of the challenging issues in natural language processing; specifically, in low resource languages such as Persian. In this research we prepare a causality human annotated corpus for the Persian language which consists of 4446 sentences and 5128 causal relations and three labels of cause, effect and causal mark -if possible- are specified for each relation. We have used this corpus to train a system for detecting causal elements’ boundaries. Also, we present a causality detection benchmark for three machine learning methods and two deep learning systems based on this corpus. Performance evaluations indicate that our best total result is obtained through CRF classifier which has F-measure of 0.76 and the best accuracy obtained through BiLSTM-CRF deep learning method with Accuracy equal to %91.4. Keywords: PerCause, Causality annotated corpus, causality detection, deep learning, CRF 1. Introduction Causal relation extraction is a challenging natural language processing (NLP) open problem that mainly involves semantic analysis and is used in many NLP tasks such as textual entailment recognition, question answering, event prediction and narrative extraction. Causation indicates that one event, state or action is the result of the occurrence of the other state, event or action. Consider the following examples: (1) John is poisoned because he ate a poisonous apple. (2) It’s raining. The streets are slippery. (3) Narrow roads are dangerous. (4) The poisonous apple poisoned Snow White. (5) The kid playing with the match burned the house. They all indicate a causal relation. A causal relation usually consists of three elements of cause, effect and causal mark. For example, in (1), "john is poisoned" is the effect of "he ate a poisonous apple" and "because" is the causal mark. As we can see in the above examples, the *corresponding author 2 causal elements may be words, phrases or predicates and also would occur in one sentence or two. Causal relations can be categorized from different points of view; e.g. according to the appearance of causal mark in the sentence. The causal mark may appear in the sentence or not and in the case of appearance it may be ambiguous or not. For example, in (3), "danger" is the effect of "narrow roads", but we do not have any explicit causal mark. Or in (4), the causal mark (“after”) would be ambiguous as it is not always an indicator of causality. On the other hand, the causative constructions may be either explicit or implicit (Girju, 2003). Even cause or effect may be expressed implicitly in a causal sentence. E.g. in (5) “kid playing with the match” is not the reason of burning the house, the fire is the reason. So finding the exact boundaries of causal relations is very important and challenging and needs world knowledge. Recognition of causal elements can be done through rule-based methods or machine learning ones that need tagged corpus. In general, it seems that corpus-based approaches are well suited for such problems (Goyal et al., 2017). Corpus-based causal detection systems are meant to determine the borders of each causal element within a single sentence or a sentence pair and classify them into predefined categories of cause, effect and causal mark. Causal training corpora already exist in languages such as English (some of them are introduced in section 2), but to the best of our knowledge such corpora are not developed for Persian. In this paper, we present a causality annotated corpus in Persian (PerCause) and a benchmark for training models based on the created dataset PerCause covers simple and general causation and complicated ones such as metaphysical causations, or weak, negated or nested causal relations are not in our scope. This paper is organized as follows. Section 2 reviews related work, in section 3 we introduce Persian causality annotated corpus (PerCause) and explain the details of its production procedure. In section 4 we incorporate different machine learning and deep learning methods to extract causal relations based on PerCause and finally in section 5 the details of evaluations and discussion of the paper are presented. Section 6 concludes the paper. 2. Related work Here we review related researches in two main categories: 1) causality detection methods that may lead to semi-automatic development of a corpus and 2) manual creation of causality annotated corpora. 2.1 Causality detection Causal relation extraction can be done through two approaches: rule based methods and machine learning methods. Some researchers use linguistic patterns to identify explicit causation relations. For example, Garcia (1997), employs a semantic model which classifies causative verbal patterns (as linguistic indicators such as <NP1 causative verb NP2>) to detect causal relationships in French texts. Khoo and colleagues (2000) use predefined verbal linguistic patterns to extract cause-effect information from business and medical newspaper texts. Lou and colleagues (2016) propos a framework that automatically harvests a network of causal-effect terms from a large web corpus using specified patterns (such as A leads to B, or If A Then B). Backed by this network, they propose a metric to model the causality strength 3 between terms. In this method, causal relationships in short texts are identified using the created resource and also the presented statistical criteria. Causality extraction by machine learning needs a causality corpus which is prepared manually or semi-automatically based on particular patterns. For example, in (Girju, 2003) a training corpus, containing sentences having 60 simple causal verbs is created using a LA TIMES section of the TREC 9 text. Using a syntactic parser, 6523 relations of the form NP1-Verb-NP2 are found, from which 2101 are manually tagged as causal relations and 4422 as non-causals. These make the positive and negative examples used to train a decision tree classifier to classify causal and non-causal relations in text. Chang and Choi (2005) come up with a workaround mechanism to learn cue phrase and lexical pair probabilities from raw and domain-independent corpus in an unsupervised manner. They use these probabilities to extract inter-noun and inter-sentence causalities. In 2008, Blanco and colleagues (Blanco, et al., 2008) followed up with a work that considered explicit causal relations expressed by the pattern VerbPhrase-Relator-Cause, where relator ∈ {because, since, after, as}. They used semantically annotated SemCor 2.1 corpus for training a C4.5 decision tree binary classifier with bagging. Mirza and Tonelli in 2014 present CATENA, a system to perform temporal and causal relation extraction and classification from English text. To do this, they exploit the interaction between the temporal and the causal relations. They use a hybrid approach combining rule-based and supervised classifiers for identification of causal relations and adopt the notion of causality proposed in the annotation guidelines of the CausalTimeBank (Mirza et al., 2014; Mirza and Tonelli, 2014). This guideline accounts for CAUSE, ENABLE and PREVENT phenomena. Authors aim to assign a causal link to pairs of events when the causal relation is expressed by affect, link and causative verbs (CAUSE-, ENABLE- and PREVENT-type verbs) or when the causal relation marked by a causal mark Mirza (2014) have proposed annotation guidelines for causality between events, based on the TimeML definition of events, which considers all types of actions (punctual and durative) and states as events. They introduced the <CLINK> tag to signify a causal link and also introduced the notion of causal signals through the <C-SIGNAL> tag. Ning, et. Al (2018) claim that because a cause must occur earlier than its effect, temporal and causal relations are closely related and one relation often dictates the value of the other. So they present a joint inference framework for them, using constrained conditional models (CCMs), specifically, formulate the joint problem as an integer linear programming (ILP) problem, enforcing constraints that are inherent in the nature of time and causality. In (Dasgupta, et, al, 2018) a linguistically informed recurrent neural network architecture (a bidirectional LSTM model enhanced with an additional linguistic layer) for automatic extraction of cause-effect relations from text is proposed. The architecture uses word level embeddings and other linguistic features to detect causal events and their effects mentioned within a sentence. And finally in (Hashimoto, 2019) a weakly supervised method for extracting causality knowledge (such as Protectionism -> Trade war) from Wikipedia articles in multiple languages is presented. The key idea is to exploit the causality describing sections and multilingual redundancy of Wikipedia. 2.2 Manually annotated causality corpora There are several manual annotated resources for causality. Verb resources such as VerbNet (Schuler, 2005) and PropBank (Palmer et al., 2005) include verbs of causation. Likewise, 4 preposition schemes (e.g., Schneider et al., 2015, 2016) include some purpose- and explanation related senses. But these cases just cover specific classes of words. Up to the best of our knowledge there is no causality annotated corpus for the Persian language. But there are few causality manual annotated datasets for English language such as: BECAUSE (The Bank of Effects and Causes Stated Explicitly) (Dunietz, et al., 2015), BioCause (Claudiu et al, 2013), CaTeRS (Mostafazadeh et al., 2016) and (Dunietz, Levin and Carbonell, 2015). BECAUSE distinguishes three types of causation, each of which has slightly different semantics: Consequence, in which the cause naturally leads to the effect (e.g. We are in serious economic trouble because of inadequate regulation.); Motivation, in which some agent perceives the cause, and therefore consciously thinks, feels, or chooses something (e.g.

And the Causality Detection Benchmark

Machine-Translation Inspired Reordering As Preprocessing for Cross-Lingual Sentiment Analysis

Student Research Workshop Associated with RANLP 2011, Pages 1–8, Hissar, Bulgaria, 13 September 2011

From CHILDES to Talkbank

The General Regionally Annotated Corpus of Ukrainian (GRAC, Uacorpus.Org): Architecture and Functionality

Conference Abstracts

ALW2), Pages 1–10 Brussels, Belgium, October 31, 2018

Gold Standard Annotations for Preposition and Verb Sense With

Using Morphemes from Agglutinative Languages Like Quechua and Finnish to Aid in Low-Resource Translation

A Massively Parallel Corpus: the Bible in 100 Languages

Metapragmatics of Playful Speech Practices in Persian

Peters, H. (2017)

Better Web Corpora for Corpus Linguistics and NLP