The Effects of Lexical Resource Quality on Preference Violation Detection

The Effects of Lexical Resource Quality on Preference Violation Detection Jesse Dunietz Lori Levin and Jaime Carbonell Computer Science Department Language Technologies Institute Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA, 15213, USA Pittsburgh, PA, 15213, USA [email protected] lsl,jgc @cs.cmu.edu { } Abstract presents an opportunity to revisit such challenges from the perspective of selectional preference vio- Lexical resources such as WordNet and lations. Detecting these violations, however, con- VerbNet are widely used in a multitude stitutes a severe stress-test for resources designed of NLP tasks, as are annotated corpora for other tasks. As such, it can highlight shortcom- such as treebanks. Often, the resources ings and allow quantifying the potential benefits of are used as-is, without question or exam- improving resources such as WordNet (Fellbaum, ination. This practice risks missing sig- 1998) and VerbNet (Schuler, 2005). nificant performance gains and even entire In this paper, we present DAVID (Detector of techniques. Arguments of Verbs with Incompatible Denota- This paper addresses the importance of tions), a resource-based system for detecting pref- resource quality through the lens of a erence violations. DAVID is one component of challenging NLP task: detecting selec- METAL (Metaphor Extraction via Targeted Anal- tional preference violations. We present ysis of Language), a new system for identifying, DAVID, a simple, lexical resource-based interpreting, and cataloguing metaphors. One pur- preference violation detector. With as- pose of DAVID was to explore how far lexical is lexical resources, DAVID achieves an resource-based techniques can take us. Though F1-measure of just 28.27%. When the our initial results suggested that the answer is “not resource entries and parser outputs for very,” further analysis revealed that the problem a small sample are corrected, however, lies less in the technique than in the state of exist- the F1-measure on that sample jumps ing resources and tools. from 40% to 61.54%, and performance Often, it is assumed that the frontier of perfor- on other examples rises, suggesting that mance on NLP tasks is shaped entirely by algo- the algorithm becomes practical given re- rithms. Manning (2011) showed that this may not fined resources. More broadly, this pa- hold for POS tagging – that further improvements per shows that resource quality matters may require resource cleanup. In the same spirit, tremendously, sometimes even more than we argue that for some semantic tasks, exemplified algorithmic improvements. by preference violation detection, resource quality may be at least as essential as algorithmic en- 1 Introduction hancements. A variety of NLP tasks have been addressed 2 The Preference Violation Detection using selectional preferences or restrictions, in- Task cluding word sense disambiguation (see Navigli (2009)), semantic parsing (e.g., Shi and Mihalcea DAVID builds on the insight of Wilks (1978) that (2005)), and metaphor processing (see Shutova the strongest indicator of metaphoricity is the vi- (2010)). These semantic problems are quite chal- olation of selectional preferences. For example, lenging; metaphor analysis, for instance, has long only plants can literally be pruned. If laws is been recognized as requiring considerable seman- the object of pruned, the verb is likely metaphori- tic knowledge (Wilks, 1978; Carbonell, 1980). cal. Flagging such semantic mismatches between The advent of extensive lexical resources, an- verbs and arguments is the task of preference vio- notated corpora, and a spectrum of NLP tools lation detection. 765 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 765–770, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics We base our definition of preferences on the “The politician pruned laws regulating plastic Pragglejaz guidelines (Pragglejaz Group, 2007) bags, and created new fees for inspecting dairy for identifying the most basic sense of a word as farms.” the most concrete, embodied, or precise one. Sim- Verb Arg0 Arg1 ilarly, we define selectional preferences as the se- pruned The politician laws . bags mantic constraints imposed by a verb’s most basic regulating laws plastic bags sense. Dictionaries may list figurative senses of created The politician new fees prune, but we take the basic sense to be cutting inspecting - - dairy farms plant growth. Several types of verbs were excluded from the Table 1: SENNA’s SRL output for the example task because they have very lax preferences. These sentence above. Though this example demon- include verbs of becoming or seeming (e.g., trans- strates only two arguments, SENNA is capable of form, appear), light verbs, auxiliaries, and aspec- labeling up to six. tual verbs. For the sake of simplifying implemen- tation, phrasal verbs were also ignored. Restriction WordNet Synsets 3 Algorithm Design animate animate being.n.01 people.n.01 To identify violations, DAVID employs a simple person.n.01 algorithm based on several existing tools and re- concrete physical object.n.01 sources: SENNA (Collobert et al., 2011), a seman- matter.n.01 tic role labeling (SRL) system; VerbNet, a com- substance.n.01 putational verb lexicon; SemLink (Loper et al., organization social group.n.01 2007), which includes mappings between Prop- district.n.01 Bank (Palmer et al., 2005) and VerbNet; and WordNet. As one metaphor detection component Table 2: DAVID’s mappings between some of METAL’s several, DAVID is designed to favor common VerbNet restriction types and WordNet precision over recall. The algorithm is as follows: synsets. 1. Run the Stanford CoreNLP POS tagger (Toutanova et al., 2003) and the TurboParser Each VerbNet restriction is interpreted as man- dependency parser (Martins et al., 2011). dating or forbidding a set of WordNet hypernyms, 2. Run SENNA to identify the semantic argu- defined by a custom mapping (see Table 2). ments of each verb in the sentence using the For example, VerbNet requires both the Patient PropBank argument annotation scheme (Arg0, of a verb in carve-21.2-2 and the Theme Arg1, etc.). See Table 1 for example output. of a verb in wipe manner-10.4.1-1 to 3. For each verb V , find all VerbNet entries for be concrete. By empirical inspection, concrete V . Using SemLink, map each PropBank argu- nouns are hyponyms of the WordNet synsets ment name to the corresponding VerbNet the- physical object.n.01, matter.n.03, matic roles in these entries (Agent, Patient, or substance.n.04. Laws (the Patient of etc.). For example, the VerbNet class for prune prune) is a hyponym of none of these, so prune is carve-21.2-2. SemLink maps Arg0 to would be flagged as a violation. the Agent of carve-21.2-2 and Arg1 to the Patient. 4 Corpus Annotation 4. Retrieve from VerbNet the selectional restric- To evaluate our system, we assembled a corpus tions of each thematic role. In our running of 715 sentences from the METAL project’s cor- example, VerbNet specifies +int control pus of sentences with and without metaphors. The and +concrete for the Agent and Patient of corpus was annotated by two annotators follow- carve-21.2-2, respectively. ing an annotation manual. Each verb was marked 5. If the head of any argument cannot be inter- for whether its arguments violated the selectional preted to meet V ’s preferences, flag V as a vi- preferences of the most basic, literal meaning of olation. the verb. The annotators resolved conflicts by dis- 766 Error source Frequency tations for non-verbs. The only parser-related error we corrected was a mislabeled noun. Bad/missing VN entries 4.5 (14.1%) Bad/missing VN restrictions 6 (18.8%) 6.2 Correcting Corrupted Data in VerbNet Bad/missing SL mappings 2 (6.3%) The VerbNet download is missing several sub- Parsing/head-finding errors 3.5 (10.9%) classes that are referred to by SemLink or that SRL errors 8.5 (26.6%) have been updated on the VerbNet website. Some VN restriction system too weak 4 (12.5%) roles also have not been updated to the latest ver- Confounding WordNet senses 3.5 (10.9%) sion, and some subclasses are listed with incor- Endemic errors: 7.5 (23.4%) rect IDs. These problems, which caused SemLink Resource errors: 12.5 (39.1%) mappings to fail, were corrected before reviewing Tool errors: 12 (37.5%) errors from the corpus. Total: 32 (100%) Six subclasses needed to be fixed, all of which were easily detected by a simple script that did not Table 3: Sources of error in 90 randomly selected depend on the 90-sentence subcorpus. We there- sentences. For errors that were due to a combi- fore expect that few further changes of this type nation of sources, 1/2 point was awarded to each would be needed for a more complete resource resource. (VN stands for VerbNet and SL for Sem- finement effort. Link.) 6.3 Corpus-Based Updates to SemLink Our modifications to SemLink’s mappings in- cussing until consensus. cluded adding missing verbs, adding missing roles 5 Initial Results to mappings, and correcting mappings to more ap- propriate classes or roles. We also added null map- As the first row of Table 4 shows, our initial eval- pings in cases where a PropBank argument had no uation left little hope for the technique. With corresponding role in VerbNet. This makes the such low precision and F1, it seemed a lexical system’s strategy for ruling out mappings more re- resource-based preference violation detector was liable. out. When we analyzed the errors in 90 randomly No corrections were made purely based on the selected sentences, however, we found that most sample. Any time a verb’s mappings were edited, were not due to systemic problems with the ap- VerbNet was scoured for plausible mappings for proach; rather, they stemmed from SRL and pars- every verb sense in PropBank, and any nonsensi- ing errors and missing or incorrect resource entries cal mappings were deleted.

The Effects of Lexical Resource Quality on Preference Violation Detection

An Arabic Wordnet with Ontologically Clean Content

Verbnet Based Citation Sentiment Class Assignment Using Machine Learning

Lexical Resource Reconciliation in the Xerox Linguistic Environment

Wordnet As an Ontology for Generation Valerio Basile

Towards a Cross-Linguistic Verbnet-Style Lexicon for Brazilian Portuguese

Modeling and Encoding Traditional Wordlists for Machine Applications

TEI and the Documentation of Mixtepec-Mixtec Jack Bowers

Unification of Multiple Treebanks and Testing Them with Statistical Parser with Support of Large Corpus As a Lexical Resource

Instructions for ACL-2010 Proceedings

The Interplay Between Lexical Resources and Natural Language Processing

Leveraging Verbnet to Build Corpus-Specific Verb Clusters

Statistical Machine Translation with a Factorized Grammar