Towards Automatic Verb Acquisition from Verbnet for Spoken Dialog Processing
Total Page:16
File Type:pdf, Size:1020Kb
Towards Automatic Verb Acquisition from VerbNet for Spoken Dialog Processing Mary Swift Department of Computer Science University of Rochester Rochester, NY 14607 USA [email protected] There are numerous approaches to verb Abstract classification. For example, Levin (1993) defines semantic verb classes that pattern according to This paper presents experiments on using syntactic alternations. The Levin classes are the VerbNet as a resource for understanding basis of the online lexical resource VerbNet unknown verbs encountered by a spoken (Kipper, Dang and Palmer, 2000; Kipper 2003). dialog system. Coverage of unknown verbs in However FrameNet (Baker, Fillmore and Lowe, a corpus of spoken dialogs about computer 1998), another hand-crafted lexical resource, purchasing is assessed, and two methods for classifies verbs using core semantic concepts, automatically integrating representations of rather than syntactic alternations (see Baker and verbs found in VerbNet are explored. The Ruppenhofer (2002) for an interesting comparison first identifies VerbNet classes containing of the two approaches). Machine learning verbs already defined in the system, and techniques have been used to induce classes from generates representations for unknown verbs distributional features extracted from annotated in those classes, modelled after the existing corpora (e.g., Merlo and Stevenson, 2001; Schulte system representation. The second method im Walde, 2000). generates representations based on VerbNet This paper reports experiments on using alone. The second method performs better, VerbNet as a resource for verbs not defined in but gaps in coverage and differences between TRIPS. VerbNet coverage of unknown verbs the two verb representation systems limit the occurring in a corpus of spoken dialogs about success of automatic acquisition. computer purchasing is evaluated. VerbNet coverage has been previously evaluated in 1 Introduction (Kipper et al, 2004b) by matching syntactic TRIPS (The Rochester Interactive Planning coverage for selected verbs in PropBank System) is a collaborative dialog assistant that (Kingsbury and Palmer, 2002). In the present performs full loop intelligent dialog processing, evaluation, TRIPS obtains representations from from speech understanding and semantic parsing VerbNet for use during parsing to automatically through intention recognition, task planning and generate semantic representations of utterances natural language generation. In recent years the that can be used by the system to reason about the system has undergone rapid expansion to several computer purchasing task. new domains. Traditionally the system has used a The experiments explore methods for hand-constructed lexicon, but increased demand automatically acquiring VerbNet representations for coverage of new domains in a short time in TRIPS. The verb representations in TRIPS and period together with the availability of online VerbNet were developed independently and for lexical resources has prompted investigation into different purposes, so successfully integrating the incorporating existing lexical resources. two presents some challenges. Verb classification The ability to handle spontaneous speech in TRIPS is organized along semantic lines demands broad coverage and flexibility. Verbs are similar to FrameNet (see section 2) instead of the a locus of information for overall sentence diathesis-based classification of VerbNet. structure and selectional restrictions on Dzikovska (2004) has noted that there is a good arguments, so their representation and deal of overlap between the two in terms of the organization is crucial for natural language representation of predicate argument structure and processing. fill-container (situation) parent: filling roles: agent (+ intentional) theme (+ phys-obj) goal (+ container) load type: fill-container templ: agent-theme-goal “load the oranges in the truck” agent theme goal subj obj pp-comp Figure 1: Schematic of the three main components of a TRIPS lexical definition: semantic type, lexical entry, and linking template for one sense of the verb load. associated thematic roles. The experiments The current semantic verb hierarchy takes reported here provide a more detailed comparison FrameNet frames (Baker, Fillmore and Lowe, between the two systems and show that in spite of 1998) as its starting point, but incorporates the similarities, there are enough differences to characteristics that streamline it for use in practical make the integration challenging. spoken dialog processing, such as hierarchical Two automatic acquisition methods are structure and a reduced set of role names explored. The first creates definitions for verbs in (Dzikovska, Swift and Allen, 2004). Each sense VerbNet classes containing verbs already defined definition also includes an example of usage and a in TRIPS, using the existing definition as a model. meta-data vector that records the origin and date of The second method generates lexical definitions entry, date of change, and comments. A based on VerbNet information alone. The methods (simplified) schematic representation for the are evaluated by integrating the new definitions definition for the verb load is shown in Figure 1. into the system and parsing a corpus of transcribed utterances containing the new verbs. Deriving verb The semantic hierarchy classifies verbs in terms definitions directly from VerbNet provides a of semantic types that describe the predicate- greater number of acceptable definitions than argument structure. Syntactic frames for licensed basing new definitions on existing representations constructions are not part of the class specification, in TRIPS, highlighting some of the difficulties in as they are in VerbNet. Rather, they are reconciling independently developed verb enumerated in the lexical entry itself, as a representation systems. component of a sense entry. 2 Verb representation in TRIPS At the time of this evaluation there are 522 verb lemmas in the TRIPS lexicon. Roughly half of A lexical representation in TRIPS consists of an these are also found in VerbNet, although the sense orthographic form, part of speech, morphological distribution for identical lemmas do not always specifications (if the standard paradigm does not correspond, as the evaluation in section 4 shows. apply), and sense definitions. A lexeme may have one or more sense definitions, which consist of a 3 VerbNet semantic type with associated thematic roles and semantic features (Dzikovska 2004), and a VerbNet is a hierarchical verb lexicon that uses the template that specifies the linking between the Levin verb classes to systematically group verbs thematic roles and syntactic arguments. into “semantically coherent” classes according to the alternations between the different syntactic frames in which they appear. VerbNet expands on the Levin classes by providing explicit syntactic TRIPS structures and the other included target verb and semantic information, including thematic roles definitions based on VerbNet data alone. annotated with semantic features, and syntactic When target verb representations were not based frames for each verb. VerbNet frames use the on a TRIPS class match, representations for 17 thematic role names to describe the syntactic additional verbs were generated: advance, exit, fax, constructions in which a verb can appear. For interest, package, page, price, rate, set, slow, split, example, the frame Agent V Patient describes a supply, support, train, transfer, wire, zip. These transitive construction for change of state verbs, as verb representations were evaluated on a separate in Floyd broke the vase. corpus of 32 transcribed target utterances. The experiments reported here are based on 4.1.1 Acquiring verbs based on TRIPS VerbNet v1.5,1 consisting of 191 main classes representations subsuming more than 4000 verb senses and approximately 3000 verb lemmas. The first method automatically generated verb definitions for the target words by identifying 4 Evaluation VerbNet classes that contained verbs for which definitions already existed. If a VerbNet class A new corpus in a computer purchasing domain is contained a verb already defined in TRIPS, the used for the evaluation. The corpus data consist of frames associated with the VerbNet class were human-human dialogs collected as a basis for compared to the linking templates for all senses development of a new computer purchasing defined for the TRIPS verb. If a match was found, domain. The interlocutors model a scenario in lexical entries based on the existing representations which users interact with an intelligent assistant to were generated for the target verb(s) in that purchase computer equipment. The corpus VerbNet class. The new verbs were defined using comprises 23 dialogs totalling approximately 6900 existing semantic types, their associated thematic utterances. At the time of the evaluation there are roles, and the linking template(s) corresponding to 139 verbs in the computer purchasing corpus that the matching sense entry. An example of a are not defined in TRIPS (henceforth ‘target successful match is target verb subtract found in verbs’). Of these, 66 have definitions in VerbNet. VerbNet class remove-10.1, which includes the Two methods (described in sections 4.1.1 and frames Agent V Theme and Agent V Theme (prep 4.1.2) were used to automatically acquire target src)2 Source. The verb remove is in this class, and verb definitions from VerbNet, which were then it is also defined in TRIPS as semantic