Journal of Artificial Intelligence Research 52 (2015) 445-475 Submitted 09/14; published 03/15 Modeling the Lifespan of Discourse Entities with Application to Coreference Resolution Marie-Catherine de Marneffe
[email protected] Linguistics Department The Ohio State University Columbus, OH 43210 USA Marta Recasens
[email protected] Google Inc. Mountain View, CA 94043 USA Christopher Potts
[email protected] Linguistics Department Stanford University Stanford, CA 94035 USA Abstract A discourse typically involves numerous entities, but few are mentioned more than once. Dis- tinguishing those that die out after just one mention (singleton) from those that lead longer lives (coreferent) would dramatically simplify the hypothesis space for coreference resolution models, leading to increased performance. To realize these gains, we build a classifier for predicting the singleton/coreferent distinction. The model’s feature representations synthesize linguistic insights about the factors affecting discourse entity lifespans (especially negation, modality, and attitude predication) with existing results about the benefits of “surface” (part-of-speech and n-gram-based) features for coreference resolution. The model is effective in its own right, and the feature represen- tations help to identify the anchor phrases in bridging anaphora as well. Furthermore, incorporating the model into two very different state-of-the-art coreference resolution systems, one rule-based and the other learning-based, yields significant performance improvements. 1. Introduction Karttunen imagined a text interpreting system designed to keep track of “all the individuals, that is, events, objects, etc., mentioned in the text and, for each individual, record whatever is said about it” (Karttunen, 1976, p. 364).