Annotating Event Mentions in Text with Modality, Focus, and Source Information
Total Page:16
File Type:pdf, Size:1020Kb
Annotating Event Mentions in Text with Modality, Focus, and Source Information Suguru Matsuyoshiy, Megumi Eguchiy, Chitose Saoy, Koji Murakamiy, Kentaro Inuiz;y, Yuji Matsumotoy yGraduate School of Information Science, zGraduate School of Information Science, Nara Institute of Science and Technology Tohoku University 8916-5 Takayama, Ikoma, Nara 630-0192, Japan 6-6-05, Aoba, Aramaki, Aoba-ku, Sendai, Miyagi, 980-8579, Japan fmatuyosi, megumi-e, chitose-s, kmurakami, [email protected], [email protected] Abstract Many natural language processing tasks, including information extraction, question answering and recognizing textual entailment, require analysis of the polarity, focus of polarity, tense, aspect, mood and source of the event mentions in a text in addition to its predicate- argument structure analysis. We refer to modality, polarity and other associated information as extended modality. In this paper, we propose a new annotation scheme for representing the extended modality of event mentions in a sentence. Our extended modality consists of the following seven components: Source, Time, Conditional, Primary modality type, Actuality, Evaluation and Focus. We reviewed the literature about extended modality in Linguistics and Natural Language Processing (NLP) and defined appropriate labels of each component. In the proposed annotation scheme, information of extended modality of an event mention is summarized at the core predicate of the event mention for immediate use in NLP applications. We also report on the current progress of our manual annotation of a Japanese corpus of about 50,000 event mentions, showing a reasonably high ratio of inter-annotator agreement. 1. Introduction the components of modality in a broad sense, as described In Natural Language Processing (NLP), development of in detail in section 3, and is important for interpreting a syntactic and semantic parsers is a major task, and recently writer’s attitude toward event mentions. we have seen the development of precise POS taggers, de- A sentence may have several event mentions, though it does pendency parsers, and predicate-argument structure analyz- not include any connectives such as “and” and “although”. ers. Identifying predicate-argument structure in a sentence For example, sentence (3) has three event mentions, where is important but insufficient for applications such as infor- the core predicates are “decide(d)”, “stop” and “buy(ing)”. mation extraction (IE), question answering (QA) and rec- (3) Jim decided to stop buying that weekly magazine. ognizing textual entailment (RTE). These applications re- quire analyzing the polarity, focus of polarity, tense, as- Modality in a narrow sense (hereafter referred as restricted pect, mood and source of the event mentions in a text in modality) and polarity can be assigned to each event men- addition to predicate-argument structure analysis. For ex- tion, even if the event mention is not the main proposition ample, the verb “submitted” in sentence (1) has three ar- in the sentence. For example, the restricted modality of an guments “Ann”, “the plan” and “the comittee”, which can event “Jim buying that weekly magazine” can be regarded be identified by predicate-argument structure analysis, and as volition due to indirect effect of “decided” and its polar- the verb and these arguments correspond to an event men- ity negative due to direct effect of “stop”. We would like to tion “Ann submitting the plan to the comittee” (underlined recognize extended modality of such event mentions as well in sentence (1)). The modality toward the event mention as the main event mention. It is because they are dependent shows John’s assertion that the event actually happened. on the main event in the sentence but can be regarded as The verb “cause” and its arguments “mercury-based vac- distinct events in applications such as IE and RTE. For in- cines” and “autism in children” underlined in sentence (2) terpreting a writer’s attitude toward such event mentions, correspond to an event mention, toward which the modal- their extended modality should be analyzed. ity indicates the doctor’s inference that the underlined event In this paper, as a first step toward constructing an analyzer does not happen. of extended modality, we propose a new annotation scheme (1) John claimed that Ann submitted the plan to the for representing extended modality of event mentions in a committee. sentence, and report on the current progress of manual an- notation of a Japanese corpus of about 50,000 event men- (2) The doctor speculated that mercury-based vaccines tions. did not cause autism in children. Distinguishing assertion and inference is essential for NLP 2. Related work applications such as IE and RTE, because information as- A writer’s attitude toward event mentions is mainly rep- serted is obviously much more reliable than information in- resented with restricted modality. In English, restricted ferred. We refer to the modality, polarity and other associ- modality has two basic categories; one is Propositional ated information of an event mention in a given sentence modality, which consists of Epistemic modality and Ev- as extended modality. Extended modality covers almost all idential modality, and the other is Event modality which certain- transition evalu- modality in a polar- focus source time condi- pty of certainty ation narrow sense ity tional (Light et al., 2004) p p p p (Rubin et al., 2005) p p p p p p (Saur´ı et al., 2006) p p p p p (Prasad et al., 2006) p p p (Saur´ı and Pustejovsky, 2007) p (Medlock and Briscoe, 2007) p p (Szarvas et al., 2008) p p p p p p p (Saur´ı, 2008; Saur´ı and Pustejovsky, 2009) p p p p p p p (Hara and Inui, 2008; Inui et al., 2008) p p p p (Kawazoe et al., 2009) p p p p p p (Im et al., 2009) p p p p p p p p p Our work Table 1: Components of extended modality considered in our work and related works. is divided into Deontic modality and Dynamic modality and Pustejovsky, 2009; Kawazoe et al., 2009; Im et al., (Palmer, 2001). Other important categories of modality are 2009) and the other is for automatically analyzing extended Future, Negative, Interrogative, Imperative (Jussive), Pre- modality in text (Rubin et al., 2005; Saur´ı and Pustejovsky, supposed, Conditional, Purposive, Resultative, Desidera- 2007; Hara and Inui, 2008; Inui et al., 2008). A pioneering tive, and Fears. Modal logic with possible worlds has been work for an annotation scheme is Saur´ı et al.’s FactBank studied to capture the logical properties of modal expres- (Saur´ı and Pustejovsky, 2009). In FactBank, an event men- sions strictly (Portner, 2009). By using well-defined acces- tion is annotated with its source, introduced by Wiebe et sibility relations or conversational backgrounds, the modal al. (2005), epistemic modality and polarity for represent- logic represents various modal expressions while differen- ing event factuality, along with attribute values for tense, tiating between ones that have almost the same meaning. aspect, modality and polarity provided by a “MAKEIN- Classifications of restricted modality in Linguistics and sys- STANCE” element in TimeML (Saur´ı et al., 2006). A fac- tems of modal logic are helpful for constructing a system of tuality value of FactBank is represented as a combination of extended modality in our research, but they are unable to be values at epistemic modality axis (“certain (CT)”, “proba- adopted directly. We give three reasons for this. The first ble (PR)”, “possible (PS)” and “underspecified (U)”) and one is that these works mainly focus on classification of polarity axis (“positive (+)”, “negative (¡)” and “under- modal expressions while we focus on classification of event specified (u)”). For example, an event in text is labeled mentions based on their extended modality. The second with “CT+” when it is certain that the event happened or reason is that the main targets in these works are modality will happen according to the source of the text. An event classes of the main proposition in a sentence, and we want in text is labeled with “PR¡” when it is propable that the to deal with the extended modality of both the main propo- event did not happen or will not happen according to the sition in a sentence and the event mentions embedded in source of the text. The factuality category from FactBank that proposition, as mentioned in Section 1. The third rea- is practical for applications such as IE, QA and RTE, be- son is that classifications in these works are too fine-grained cause it summarizes information of whether a target event to identify automatically. We will describe our annotation actually happens or not along with degree of certainty in scheme of extended modality in Section 3. a way that is easily accesible for applications. However, In recent years, there has been increasing attention paid the framework of FactBank is not sufficient for extended to annotating phrases and event mentions in text with ex- modality of event mentions in text because FactBank relies tended modality in the fields of bioinformatics and NLP. on a “MAKEINSTANCE” element in TimeML (Saur´ı et al., We show a list of related work with considered components 2006) for restricted modality except for epistemic modal- of extended modality in Table 1. ity. In TimeML, restricted modality is specified at “modal- Studies in bioinformatics (Light et al., 2004; Medlock and ity” attribute of the “MAKEINSTANCE” element with a Briscoe, 2007; Szarvas et al., 2008) mainly focus on cer- surface auxiliary verb, such as “SHOULD” and “MUST”. tainty toward event mentions in text, because they want This scheme cannot be directly applied to agglutinative lan- to distinguish between asserted propositions and proposi- guages such as Korean and Japanese, because they tend to tions with inferential expressions and hedges.