Automatic Sense Prediction for Implicit Discourse Relations in Text
Total Page:16
File Type:pdf, Size:1020Kb
Automatic sense prediction for implicit discourse relations in text Emily Pitler, Annie Louis, Ani Nenkova Computer and Information Science University of Pennsylvania Philadelphia, PA 19104, USA epitler,lannie,[email protected] Abstract in newspaper text. For our experiments, we use the Penn Discourse Treebank, the largest exist- We present a series of experiments on au- ing corpus of discourse annotations for both im- tomatically identifying the sense of im- plicit and explicit relations. Our work is also plicit discourse relations, i.e. relations informed by the long tradition of data intensive that are not marked with a discourse con- methods that rely on huge amounts of unanno- nective such as “but” or “because”. We tated text rather than on manually tagged corpora work with a corpus of implicit relations (Marcu and Echihabi, 2001; Blair-Goldensohn et present in newspaper text and report re- al., 2007). sults on a test set that is representative In our analysis, we focus only on implicit dis- of the naturally occurring distribution of course relations and clearly separate these from senses. We use several linguistically in- explicits. Explicit relations are easy to iden- formed features, including polarity tags, tify. The most general senses (comparison, con- Levin verb classes, length of verb phrases, tingency, temporal and expansion) can be disam- modality, context, and lexical features. In biguated in explicit relations with 93% accuracy addition, we revisit past approaches using based solely on the discourse connective used to lexical pairs from unannotated text as fea- signal the relation (Pitler et al., 2008). So report- tures, explain some of their shortcomings ing results on explicit and implicit relations sepa- and propose modifications. Our best com- rately will allow for clearer tracking of progress. bination of features outperforms the base- In this paper we investigate the effectiveness of line from data intensive approaches by 4% various features designed to capture lexical and for comparison and 16% for contingency. semantic regularities for identifying the sense of implicit relations. Given two text spans, previous 1 Introduction work has used the cross-product of the words in Implicit discourse relations abound in text and the spans as features. We examine the most infor- readers easily recover the sense of such relations mative word pair features and find that they are not during semantic interpretation. But automatic the semantically-related pairs that researchers had sense prediction for implicit relations is an out- hoped. We then introduce several other methods standing challenge in discourse processing. capturing the semantics of the spans (polarity fea- Discourse relations, such as causal and contrast tures, semantic classes, tense, etc.) and evaluate relations, are often marked by explicit discourse their effectiveness. This is the first study which connectives (also called cue words) such as “be- reports results on classifying naturally occurring cause” or “but”. It is not uncommon, though, for a implicit relations in text and uses the natural dis- discourse relation to hold between two text spans tribution of the various senses. without an explicit discourse connective, as the ex- ample below demonstrates: 2 Related Work (1) The 101-year-old magazine has never had to woo ad- Experiments on implicit and explicit relations vertisers with quite so much fervor before. [because] It largely rested on its hard-to-fault demo- Previous work has dealt with the prediction of dis- graphics. course relation sense, but often for explicits and at In this paper we address the problem of au- the sentence level. tomatic sense prediction for discourse relations Soricut and Marcu (2003) address the task of 683 Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 683–691, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP parsing discourse structures within the same sen- 3 Penn Discourse Treebank tence. They use the RST corpus (Carlson et al., For our experiments, we use the Penn Discourse 2001), which contains 385 Wall Street Journal ar- Treebank (PDTB; Prasad et al., 2008), the largest ticles annotated following the Rhetorical Structure available annotated corpora of discourse relations. Theory (Mann and Thompson, 1988). Many of The PDTB contains discourse annotations over the the useful features, syntax in particular, exploit same 2,312 Wall Street Journal (WSJ) articles as the fact that both arguments of the connective are the Penn Treebank. found in the same sentence. Such features would For each explicit discourse connective (such as not be applicable to the analysis of implicit rela- “but” or “so”), annotators identified the two text tions that occur intersententially. spans between which the relation holds and the Wellner et al. (2006) used the GraphBank (Wolf sense of the relation. and Gibson, 2005), which contains 105 Associated The PDTB also provides information about lo- Press and 30 Wall Street Journal articles annotated cal implicit relations. For each pair of adjacent with discourse relations. They achieve 81% accu- sentences within the same paragraph, annotators racy in sense disambiguation on this corpus. How- selected the explicit discourse connective which ever, GraphBank annotations do not differentiate best expressed the relation between the sentences between implicits and explicits, so it is difficult to and then assigned a sense to the relation. In Exam- verify success for implicit relations. ple (1) above, the annotators identified “because” as the most appropriate connective between the Experiments on artificial implicits Marcu and sentences, and then labeled the implicit discourse Echihabi (2001) proposed a method for cheap ac- relation Contingency. quisition of training data for discourse relation In the PDTB, explicit and implicit relations are sense prediction. Their idea is to use unambiguous clearly distinguished, allowing us to concentrate patterns such as [Arg1, but Arg2.] to create syn- solely on the implicit relations. thetic examples of implicit relations. They delete As mentioned above, each implicit and explicit the connective and use [Arg1, Arg2] as an example relation is annotated with a sense. The senses of an implicit relation. are arranged in a hierarchy, allowing for annota- tions as specific as Contingency.Cause.reason. In The approach is tested using binary classifica- our experiments, we use only the top level of the tion between relations on balanced data, a setting sense annotations: Comparison, Contingency, Ex- very different from that of any realistic applica- pansion, and Temporal. Using just these four rela- tion. For example, a question-answering appli- tions allows us to be theory-neutral; while differ- cation that needs to identify causal relations (i.e. ent frameworks (Hobbs, 1979; McKeown, 1985; as in Girju (2003)), must not only differentiate Mann and Thompson, 1988; Knott and Sanders, causal relations from comparison relations, but 1998; Asher and Lascarides, 2003) include differ- also from expansions, temporal relations, and pos- ent relations of varying specificities, all of them sibly no relation at all. In addition, using equal include these four core relations, sometimes under numbers of examples of each type can be mislead- different names. ing because the distribution of relations is known Each relation in the PDTB takes two arguments. to be skewed, with expansions occurring most fre- Example (1) can be seen as the predicate Con- quently. Causal and comparison relations, which tingency which takes the two sentences as argu- are most useful for applications, are less frequent. ments. For implicits, the span in the first sentence Because of this, the recall of the classification is called Arg1 and the span in the following sen- should be the primary metric of success, while tence is called Arg2. the Marcu and Echihabi (2001) experiments report only accuracy. 4 Word pair features in prior work Later work (Blair-Goldensohn et al., 2007; Cross product of words Discourse connectives Sporleder and Lascarides, 2008) has discovered are the most reliable predictors of the semantic that the models learned do not perform as well on sense of the relation (Marcu, 2000; Pitler et al., implicit relations as one might expect from the test 2008). However, in the absence of explicit mark- accuracies on synthetic data. ers, the most easily accessible features are the 684 words in the two text spans of the relation. In- In a similar vein, Lapata and Lascarides (2004) tuitively, one would expect that there is some rela- used pairings of only verbs, nouns and adjectives tionship that holds between the words in the two for predicting which temporal connective is most arguments. Consider for example the following suitable to express the relation between two given sentences: text spans. Verb pairs turned out to be one of the The recent explosion of country funds mirrors the ”closed- best features, but no useful information was ob- end fund mania” of the 1920s, Mr. Foot says, when narrowly tained using nouns and adjectives. focused funds grew wildly popular. They fell into oblivion Blair-Goldensohn et al. (2007) proposed sev- after the 1929 crash. eral refinements of the word pair model. They The words “popular” and “oblivion” are almost show that (i) stemming, (ii) using a small fixed antonyms, and one might hypothesize that their vocabulary size consisting of only the most fre- occurrence in the two text spans is what triggers quent stems (which would tend to be dominated the contrast relation between the sentences. Sim- by function words) and (iii) a cutoff on the mini- ilarly, a pair of words such as (rain, rot) might be mum frequency of a feature, all result in improved indicative of a causal relation. If this hypothesis is performance. They also report that filtering stop- words has a negative impact on the results. correct, pairs of words (w1, w2) such that w1 ap- Given these findings, we expect that pairs of pears in the first sentence and w2 appears in the second sentence would be good features for iden- function words are informative features helpful in tifying contrast relations.