Arxiv:2004.14979V2 [Cs.CL] 9 Oct 2020 Specific Paraphrases (E.G
Total Page:16
File Type:pdf, Size:1020Kb
Paraphrasing vs Coreferring: Two Sides of the Same Coin Yehudit Meged1 Avi Caciularu1;2 Vered Shwartz2;3 Ido Dagan1 1Computer Science Department, Bar-Ilan University, Ramat-Gan, Israel 2Allen Institute for Artificial Intelligence 3Paul G. Allen School of Computer Science & Engineering, University of Washington fyehuditmeged,[email protected], [email protected], [email protected] Abstract Tara Reid has checked into_ Promises Treatment Center. Actress Tara Reid entered_ well-known Malibu rehab center. We study the potential synergy between two Lindsay Lohan checked into× rehab in Malibu, California. different NLP tasks, both confronting predi- Director Chris Weitz is expected to direct_ New Moon. cate lexical variability: identifying predicate Chris Weitz will take on_ the sequel to “Twilight”. paraphrases, and event coreference resolution. Gary Ross is still in negotiations to direct× the sequel. First, we used annotations from an event coreference dataset as distant supervision to Table 1: Examples from ECB+ (a cross-document re-score heuristically-extracted predicate para- coreference dataset) that illustrate the context-sensitive phrases. The new scoring gained more than nature of event coreference. The illustrated predicates 18 points in average precision upon their rank- are co-referable, and hence may be used to refer to the ing by the original scoring method. Then, we same event in certain contexts, but obviously not all used the same re-ranking features as additional their mentions corefer. inputs to a state-of-the-art event coreference resolution model, which yielded modest but consistent improvements to the model’s perfor- phrases. The former identifies and clusters event mance. The results suggest a promising direc- mentions, across multiple documents, that refer tion to leverage data and models for each of to the same event within their respective contexts. the tasks to the benefit of the other. The latter task, on the other hand, collects pairs of event expressions that, at the generic lexical level, 1 Introduction may refer to the same event in certain contexts. Recognizing that mentions of different lexical Table1 illustrates this difference with examples predicates discuss the same event is challenging of co-referable predicate paraphrases, while their (Barhom et al., 2019). Lexical resources such as mentions obviously do not always co-refer. WordNet (Miller, 1995) capture such synonyms Cross-document event coreference resolution (say, tell) and hypernyms (whisper, talk), as well as systems are typically supervised, usually trained on antonyms, which can be used to refer to the same the ECB+ dataset, which contains clusters of news event when the arguments are reversed ([a]0 beat articles on different topics (Cybulska and Vossen, [a]1, [a]1 lose to [a]0). However, WordNet’s cov- 2014). Recent systems rely on neural representa- erage is insufficient, in particular, missing context- tions of the mentions and their contexts (Kenyon- arXiv:2004.14979v2 [cs.CL] 9 Oct 2020 specific paraphrases (e.g. (hide, launder), in the Dean et al., 2018; Barhom et al., 2019), while ear- context of money). Conversely, distributional meth- lier approaches leveraged WordNet and other lex- ods enjoy broader coverage, but their precision for ical resources to obtain a signal of whether a pair this purpose is limited because distributionally sim- of mentions may be coreferring (e.g. Bejan and ilar terms may often be mutually-exclusive (born, Harabagiu, 2010; Yang et al., 2015). die) or may refer to different event types which Approaches for acquiring predicate paraphrase, are only temporally or causally related (sentenced, in the form of a pair of paraphrastic predicates or convicted). predicate templates, were based mostly on unsuper- Two prominent lines of work pertaining to iden- vised signals. These included similarity between ar- tifying predicates whose meaning or referents can gument distributions (Lin and Pantel, 2001; Berant, be matched are cross-document (CD) event coref- 2012), backtranslation across languages (Barzilay erence resolution and recognizing predicate para- and McKeown, 2001; Ganitkevitch et al., 2013; Mallinson et al., 2017), or leveraging redundant The standard datasets used for CD event corefer- news reports on the same event, which are hence ence training and evaluation are ECB+ (Cybulska likely to refer to the same events and entities using and Vossen, 2014), and its predecessors, EECB different words (Shinyama et al., 2002; Shinyama (Lee et al., 2012) and ECB (Bejan and Harabagiu, and Sekine, 2006; Barzilay and Lee, 2003; Zhang 2010). ECB+ contains a set of topics, each contain- and Weld, 2013; Xu et al., 2014; Shwartz et al., ing a set of documents describing the same global 2017). In some cases, the paraphrase collection event. Both event and entity coreferences are anno- phase includes a step of validating a subset of the tated in ECB+, within and across documents. paraphrases and training a model on these gold Models for CD event coreference utilize a range paraphrases to re-rank the entire resource (Lan of features, including lexical overlap among men- et al., 2017). tion pairs and semantic knowledge from WordNet In this paper, we study the potential synergy (Bejan and Harabagiu, 2010, 2014; Yang et al., between predicate paraphrases and event corefer- 2015), distributional (Choubey and Huang, 2017) ence resolution. We show that the data and models and contextual representations (Kenyon-Dean et al., for one task can benefit the other. In one direc- 2018; Barhom et al., 2019). tion (Section3), we use event coreference anno- The current state-of-the-art model from Barhom tations from the ECB+ dataset as distant super- et al.(2019) iteratively and intermittently learns vision to learn an improved scoring of predicate to cluster events and entities. A mention repre- paraphrases in the unsupervised Chirps resource sentation mi consists of several components, rep- (Shwartz et al., 2017). The distantly supervised resenting both the mention span and its surround- scorer significantly improves upon ranking by the ing context. The interdependence between cluster- original Chirps scores, adding 18 points to average ing event vs. entity mentions is encoded into the precision over a test sample. mention representation, such that an event mention In the other direction (Section4), we incorpo- representation contains a component reflecting the rate data from Chirps, represented in the Chirps current entity clustering, and vice versa. Using this re-scorer feature vector, into a state-of-the-art event representation, the model trains a pairwise mention coreference system (Barhom et al., 2019). Chirps scoring function that predicts the probability that has a substantial coverage over the ECB+ corefer- two mentions refer to the same event. ring mention pairs, and consequently, the incorpo- ration yields a modest but consistent improvement 2.2 Paraphrase Identification and acquisition across the various coreference metrics.1 Paraphrases are differing textual realizations of the same meaning (Ganitkevitch et al., 2013), typically 2 Background and Motivation phrases or sentences (Dolan et al., 2005). A promi- nent approach for identifying and collecting para- In this section we provide some background about phrases, backtranslation, assumes that if two (say) the cross-document coreference resolution and English phrases translate to the same term in a for- paraphrase identification (acquisition) tasks, which eign language, across multiple foreign languages, is relevant for our approaches for synergizing these this indicates that these two phrases are paraphrases. two tasks. This approach was first suggested by Barzilay and McKeown(2001), later adapted to acquire the large 2.1 Event Coreference Resolution PPDB resource (Ganitkevitch et al., 2013), and was Event coreference resolution aims to identify and also shown to work well with neural machine trans- cluster event mentions, that, within their respective lation (Mallinson et al., 2017). contexts, refer to the same event. The task has Paraphrase Identification through Event Coref- two variants, one in which coreferring mentions erence. An alternative approach for paraphrase are within the same document (within document) identification, on which we focus in this paper, and another in which corefering mentions may be leverages multiple news documents discussing the in different documents (cross-document, CD), on same event. The underlying assumption is that which we focus in this paper. such redundant texts may refer to the same entities 1Code available at github.com/yehudit96/coreferrability, or events using lexically-divergent mentions. Co- github.com/yehudit96/event entity coref ecb plus referring mentions are identified heuristically and extracted as candidate paraphrases. When long doc- news headlines posted on Twitter on the same day. uments are used, the first step in this approach is It extracts binary predicate-argument tuples from to align each pair of documents by sentences. This each tweet and aligns pairs of predicate mentions was done by finding sentences with shared named whose arguments match, by some lexical match- entities (Shinyama et al., 2002) or lexical overlap ing criteria. The matched pairs of arguments are (Barzilay and Lee, 2003; Shinyama and Sekine, termed supporting pairs, e.g. (Chuck Berry, 90) for 2006), and by aligning pairs of predicates or ar- “[Chuck Berry]0 died at [90]1” and “[Chuck Berry]0 guments (Zhang and Weld, 2013; Recasens et al., lived