Deep Linguistic Processing for Spoken Dialogue Systems

Deep Linguistic Processing for Spoken Dialogue Systems James Allen Myroslava Dzikovska Department of Computer Science ICCS-HCRC University of Rochester University of Edinburgh [email protected] [email protected] Mehdi Manshadi Mary Swift Department of Computer Science Department of Computer Science University of Rochester University of Rochester [email protected] [email protected] The generic LF is linked to the domain-specific Abstract representations by a set of ontology mapping rules that must be defined for each domain. Once the We describe a framework for deep linguis- ontology mapping is defined, we then can auto- tic processing for natural language under- matically specialize the generic grammar to use the standing in task-oriented spoken dialogue stronger semantic restrictions that arise from the systems. The goal is to create domain- specific domain. In this paper we mainly focus on general processing techniques that can be the generic components for deep processing. The shared across all domains and dialogue work on ontology mapping and rapid grammar ad- tasks, combined with domain-specific op- aptation is described elsewhere (Dzikovska et al. timization based on an ontology mapping 2003; forthcoming). from the generic LF to the application ontology. This framework has been tested in 2 Parsing for deep linguistic processing six domains that involve tasks such as in- teractive planning, coordination operations, The parser uses a broad coverage, domain- tutoring, and learning. independent lexicon and grammar to produce the LF. The LF is a flat, unscoped representation that 1 Introduction includes surface speech act analysis, dependency information, word senses (semantic types) with Deep linguistic processing is essential for spoken semantic roles derived from the domain- dialogue systems designed to collaborate with us- independent language ontology, tense, aspect, mo- ers to perform collaborative tasks. We describe the dality, and implicit pronouns. The LF supports TRIPS natural language understanding system, fragment and ellipsis interpretation, discussed in which is designed for this purpose. As we develop Section 5.2 the system, we are constantly balancing two com- peting needs: (1) deep semantic accuracy: the need 2.1 Semantic Lexicon to produce the semantically and pragmatically deep The content of our semantic representation comes interpretations for a specific application; and (2) from a domain-independent ontology linked to a portability: the need to reuse our grammar, lexicon domain-independent lexicon. Our syntax relies on and discourse interpretation processes across do- a frame-based design in the LF ontology, a com- mains. mon representation in semantic lexicons (Baker et We work to accomplish portability by using a al., 1998, Kipper et al., 2000). The LF type hierar- multi-level representation. The central components chy is influenced by argument structure, but pro- are all based on domain general representations, vides a more detailed level of semantic analysis including a linguistically based detailed semantic than found in most broad coverage parsers as it representation (the Logical Form, or LF), illocu- distinguishes senses even if the senses take the tionary acts, and a collaborative problem-solving same argument structure, and may collapse lexical model. Each application then involves using a do- entries with different argument structures to the main-specific ontology and reasoning components. same sense. As a very simple example, the generic lexicon includes the senses for the verb take shown in Figure 1. Our generic senses have been inspired CONSUME Take an aspirin by FrameNet (Baker et al., 1998). MOVE Take it to the store In addition, types are augmented with semantic ACQUIRE Take a picture features derived from EuroWordNet (Vossen et al., SELECT I’ll take that one 1997) and extended. These are used to provide se- COMPATIBLE The projector takes 100 volts lectional restrictions, similar to VerbNet (Kipper et WITH al., 2000). The constraints are intentionally weak, TAKE-TIME It took three hours excluding utterances unsuitable in most contexts Figure 1: Some generic senses of take in lexicon (the idea slept) but not attempting to eliminate however, the features are not typed and rather than borderline combinations. multiple inheritance, the parser supports a set of The generic selectional restrictions are effective orthogonal single inheritance hierarchies to capture in improving overall parsing accuracy, while re- different syntactic and semantic properties. Struc- maining valid across multiple domains. An tural variants such as passives, dative shifts, ger- evaluation with an earlier version of the grammar unds, and so on are captured in the context-free showed that if generic selectional restrictions were rule base. The grammar has broad coverage of removed, full sentence semantic accuracy de- spoken English, supporting a wide range of con- creased from 77.8% to 62.6% in an emergency versational constructs. It also directly encodes rescue domain, and from 67.9 to 52.5% in a medi- conventional conversational acts, including stan- cal domain (using the same versions of grammar dard surface speech acts such as inform, request and lexicon) (Dzikovska, 2004). and question, as well as acknowledgments, accep- The current version of our generic lexicon con- tances, rejections, apologies, greetings, corrections, tains approximately 6400 entries (excluding mor- and other speech acts common in conversation. phological variants), and the current language on- To support having both a broad domain-general tology has 950 concepts. The lexicon can be sup- grammar and the ability to produce deep domain- plemented by searching large-scale lexical re- specific semantic representations, the semantic sources such as WordNet (Fellbaum, 1998) and knowledge is captured in three distinct layers (Fig- Comlex (Grisham et al., 1994). If an unknown ure 2), which are compiled together before parsing word is encountered, an underspecified entry is to create efficient domain-specific interpretation. generated on the fly. The entry incorporates as The first level is primarily encoded in the gram- much information from the resource as possible, mar, and defines an interpretation of the utterance such as part of speech and syntactic frame. It is in terms of generic grammatical relations. The sec- assigned an underspecified semantic classification ond is encoded in the lexicon and defines an inter- based on correspondences between our language pretation in terms of a generic language-based on- ontology and WordNet synsets. tology and generic roles. The third is encoded by a 2.2 Grammar set of ontology-mapping rules that are defined for each domain, and defines an interpretation in terms The grammar is context-free, augmented with fea- of the target application ontology. While these lev- ture structures and feature unification, motivated els are defined separately, the parser can produce from X-bar theory, drawing on principles from all three levels simultaneously, and exploit do- GPSG (e.g., head and foot features) and HPSG. A main-specific semantic restrictions to simultane- detailed description of an early non-lexicalized ously improve semantic accuracy and parsing effi- version of the formalism is in (Allen, 1995). Like ciency. In this paper we focus on the middle level, HPSG, our grammar is strongly lexicalized, with the generic LF. the lexical features defining arguments and com- plement structures for head words. Unlike HPSG, Figure 2: The Levels of Representation computed by the Parser The rules in the grammar are weighted, and a variant of MRS (Copestake et al., 2006) but is weights are combined, similar to how probabilities expressed in a frame-like notation rather than are computed in a PCFG. The weights, however, predicate calculus. In addition, it has a relatively are not strictly probabilities (e.g., it is possible to simple method of computing possible quantifier have weights greater than 1); rather, they encode scoping, drawing from the approaches by (Hobbs structural preferences. The parser operates in a & Shieber, 1987) and (Alshawi, 1990). best-first manner and as long as weights never ex- A logical form is set of terms that can be viewed ceed 1.0, is guaranteed to find the highest weighted as a rooted graph with each term being a node parse first. If weights are allowed to exceed 1.0, identified by a unique ID (the variable). There are then the parser becomes more “depth-first” and it three types of terms. The first corresponds to gen- is possible to “garden-path” and find globally sub- eralized quantifiers, and is on the form (<quant> optimal solutions first, although eventually all in- <id> <type> <modifiers>*). As a simple example, terpretations can still be found. the NP Every dog would be captured by the term The grammar used in all our applications uses (Every d1 DOG). The second type of term is the these hand-tuned rule weights, which have proven propositional term, which is represented in a neo- to work relatively well across domains. We do not Davidsonian representation (e.g., Parsons, 1990) use a statistical parser based on a trained corpus using reified events and properties. It has the form because in most dialogue-system projects, suffi- (F <id> <type> <arguments>*). The propositional cient amounts of training data are not available and terms produced from Every dog hates a cat would would be too time consuming to collect. In the one be (F h1 HATE :Experiencer d1 :Theme

Load more