
Distinguishing Past, On-going, and Future Events: The EventStatus Corpus Ruihong Huang Ignacio Cases Dan Jurafsky Texas A&M University Stanford University Stanford University [email protected] [email protected] [email protected] Cleo Condoravdi Ellen Riloff Stanford University University of Utah [email protected] [email protected] Abstract as protests, demonstrations, marches, and strikes, in which each event is annotated as PAST, ON-GOING, Determining whether a major societal event or FUTURE (sublabeled as PLANNED, ALERT or has already happened, is still on-going, or may POSSIBLE). This task bridges event extraction re- occur in the future is crucial for event pre- search and temporal research in the tradition of diction, timeline generation, and news sum- TIMEBANK (Pustejovsky et al., 2003) and TempE- marization. We introduce a new task and a new corpus, EventStatus, which has 4500 En- val (Verhagen et al., 2007; Verhagen et al., 2010; glish and Spanish articles about civil unrest UzZaman et al., 2013). Previous corpora have be- events labeled as PAST, ON-GOING, or FU- gun this association: TIMEBANK, for example, in- TURE. We show that the temporal status of cludes temporal relations linking events with Doc- these events is difficult to classify because lo- ument Creation Times (DCT). But the EventStatus cal tense and aspect cues are often lacking, task and corpus offers several new research direc- time expressions are insufficient, and the lin- tions. guistic contexts have rich semantic composi- tionality. We explore two approaches for event First, major societal events are often discussed be- status classification: (1) a feature-based SVM fore they happen, or while they are still happening, classifier augmented with a novel induced lex- because they have the potential to impact a large icon of future-oriented verbs, such as “threat- number of people. News outlets frequently report ened” and “planned”, and (2) a convolutional on impending natural disasters (e.g., hurricanes), an- neural net. Both types of classifiers improve ticipated disease outbreaks (e.g., Zika virus), threats event status recognition over a state-of-the-art of terrorism, and plans or warnings of potential civil TempEval model, and our analysis offers lin- guistic insights into the semantic composition- unrest (e.g., strikes and protests). Traditional event ality challenges for this new task. extraction research has focused primarily on recog- nizing events that have already happened. Further- more, the linguistic contexts of on-going and future 1 Introduction events involve complex compositionality, and fea- tures like explicit time expressions are less useful. When a major societal event is mentioned in the Our results demonstrate that a state-of-the-art Tem- news (e.g., civil unrest, terrorism, natural disaster), it pEval system has difficulty identifying on-going and is important to understand whether the event has al- future events, mislabeling examples like these: ready happened (PAST), is currently happening (ON- (1) The metro workers’ strike in Bucharest has entered GOING), or may happen in the future (FUTURE). We the fifth day. (On-Going) introduce a new task and corpus for studying the (2) BBC unions demand more talks amid threat of new temporal/aspectual properties of major events. The strikes. (Future) EventStatus corpus consists of 4500 English and (3) Pro-reform groups have called for nationwide Spanish news articles about civil unrest events, such protests on polling day. (Future) Second, we intentionally created the EventSta- lexicon of 411 English and 348 Spanish “future- tus corpus to concentrate on one particular event oriented” matrix verbs—verbs like “threaten” and frame (class of events): civil unrest. In contrast, “fear” whose complement clause or nominal direct previous temporally annotated corpora focus on a object argument is likely to describe a future event. wide variety of events. Focusing on one frame (se- We show that the SVM outperforms a state-of-the- mantic depth instead of breadth) makes this corpus art TempEval system and that the induced lexicon analogous to domain-specific event extraction data further improves performance for both English and sets, and therefore appropriate for evaluating rich Spanish. We also introduce a Convolutional Neu- tasks like event extraction and temporal question an- ral Network (CNN) to detect the temporal status of swering, which require more knowledge about event events. Our analysis shows that it successfully mod- frames and schemata than might be represented in els semantic compositionality for some challenging large broad corpora like TIMEBANK (UzZaman et temporal contexts. The CNN model again improves al., 2012; Llorens et al., 2015). performance in both English and Spanish, providing Third, the EventStatus corpus focuses on specific strong initial results for this new task and corpus. instances of high-level events, in contrast to the low- level and often non-specific or generic events that 2 The EventStatus Corpus 1 dominate other temporal datasets. Mentions of spe- For major societal events, it can be very impor- cific events are much more likely to be realized in tant to know whether the event has ended or if it non-finite form (as nouns or infinitives, such as “the is still in progress (e.g., are people still rioting in strike” or “to protest”) than randomly selected event the streets?). And sometimes events are anticipated keywords. In breadth-based corpora like the Event- before they actually happen, such as labor strikes, CorefBank (ECB) corpus (Bejan and Harabagiu, marches and parades, social demonstrations, politi- 2008), 34% of the events have non-finite realization; cal events (e.g., debates and elections), and acts of in TIMEBANK, 45% of the events have non-finite war. The EventStatus corpus represents the tempo- realization. By contrast, in a frame-based corpus ral status of an event as one of five categories: like ACE2005 (ACE, 2005), 59% of the events have non-finite forms. In the EventStatus corpus, 80% of Past: An event that has started and has ended. There the events have non-finite forms. Whether this is due should be no reason to believe that it may still be in to differences in labeling or to intrinsic properties of progress. these events, the result is that they are much harder On-going: An event that has started and is still in to label because tense and aspect are less available progress or likely to resume2 in the immediate fu- than for events realized as finite verbs. ture. There should be no reason to believe that it has Fourth, the EventStatus data set is multilingual: ended. we collected data from both English and Spanish Future Planned: An event that has not yet started, texts, allowing us to compare events representing but a person or group has planned for or explicitly the same event frame across two languages that are committed to an instance of the event in the future. known to differ in their typological properties for de- There should be near certainty it will happen. scribing events (Talmy, 1985). Future Alert: An event that has not yet started, but Using the new EventStatus corpus, we investigate a person or group has been threatening, warning, or two approaches for recognizing the temporal status advocating for a future instance of the event. of events. We create a SVM classifier that incor- Future Possible: An event that has not yet started, porates features drawn from prior TempEval work but the context suggests that its occurrence is a live (Bethard, 2013; Chambers et al., 2014; Llorens et possibility (e.g., it is anticipated, feared, hinted at, al., 2010) as well as a new automatically induced or is mentioned conditionally). 1 For example in TIMEBANK almost half the annotated The three subtypes of future events are important events (3720 of 7935) are hypothetical or generic, i.e., PERCEP- TION, REPORTING, ASPECTUAL, I ACTION, STATE or I STATE 2For example, demonstrators have gone home for the day rather than the specific OCCURRENCE. but are expected to return in the morning. Past [EN] Today’s demonstration ended without violence. An estimated 2,000 people protested against the government in Peru. [SP] Termino´ la manifestacion´ de los kurdos en la UNESCO de Par´ıs. On-going [EN] Negotiations continue with no end in sight for the 2 week old strike. Yesterday’s rallies have caused police to fear more today. [SP] Pacifistas latinoamericanos no cesan sus protestas contra guerra en Irak. Future Planned [EN] 77 percent of German steelworkers voted to strike to raise their wages. Peace groups have already started organizing mass protests in Sydney. [SP] Miedo en la City en v´ıspera de masivas protestas que la toman por blanco. Future Alert [EN] Farmers have threatened to hold demonstrations on Monday. Nurses are warning they intend to walkout if conditions don’t improve. [SP] Indigenas hondurenos˜ amenazan con declararse en huelga de hambre. Future Possible [EN] Residents fear riots if the policeman who killed the boy is acquitted. The military is preparing for possible protests at the G8 summit. [SP] Polic´ıa Militar analiza la posibilidad de decretar una huelga nacional. Table 1: Examples of event status categories for civil unrest events, showing two examples in English [EN] and one in Spanish [SP]. in marking not just temporal status but also what we English words3 and 13 Spanish words4 and phrases might call predictive status. Events very likely to oc- associated with civil unrest events, and added their cur are distinguished from events whose occurrence morphological variants. We then randomly selected depends on other contingencies (Future Planned vs. 2954 and 14915 news stories from the English Gi- Alert/Possible). Warnings or mentions of a potential gaword 5th Ed. (Parker et al., 2011) and Spanish event by a likely actor are further distinguished from Gigaword 3rd Ed.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-