Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces

Appears in: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) Automatic Construction of Semantic Lexicons for Learning Natural Language Interfaces Cynthia A. Thompson Raymond J. Mooney Center for the Study of Language and Information Department of Computer Sciences Stanford University University of Texas Stanford, CA 94305-4115 Austin, TX 78712 [email protected] [email protected] Abstract required although it can exploit an initial lexicon if provided. This paper describes a system, Wolfie (WOrd Learn- ing From Interpreted Examples), that acquires a se- We tested Wolfie’s ability to acquire a semantic lex- mantic lexicon from a corpus of sentences paired with icon for an interface to a geographical database using semantic representations. The lexicon learned consists a corpus of queries collected from human subjects and of words paired with meaning representations. Wolfie annotated with their logical form. In this test, Wolfie is part of an integrated system that learns to parse was integrated with Chill, which learns parsers but novel sentences into semantic representations, such as requires a semantic lexicon (previously built manu- logical database queries. Experimental results are pre- ally). The results demonstrate that the final acquired sented demonstrating Wolfie’s ability to learn useful parser performs nearly as accurately at answering novel lexicons for a database interface in four different nat- questions when using a learned lexicon as when us- ural languages. The lexicons learned by Wolfie are compared to those acquired by a similar system devel- ing a hand-built lexicon. Wolfie is also compared oped by Siskind (1996). to an alternative lexicon acquisition system developed by Siskind (1996), demonstrating superior performance on this task. Finally, the corpus was translated into Introduction & Overview Spanish, Japanese, and Turkish and experiments con- The application of learning methods to natural- ducted demonstrating an ability to learn successful lexi- language processing (NLP) has drawn increasing atten- cons and parsers for a variety of languages. Overall, the tion in recent years. Using machine learning to help au- results demonstrate a robust ability to acquire accurate tomate the construction of NLP systems can eliminate lexicons directly usable for semantic parsing. With such much of the difficulty of manual construction. The se- an integrated system, the task of building a semantic mantic lexicon, or the mapping from words to meanings, parser for a new domain is simplified. A single repre- is one component that is typically challenging and time sentative corpus of sentence/representation pairs allows consuming to construct and update by hand. This pa- the acquisition of both a semantic lexicon and parser per describes a system, Wolfie (WOrd Learning From that generalizes well to novel sentences. Interpreted Examples), that acquires a semantic lexicon of word/meaning pairs from a corpus of sentences Background paired with semantic representations. The goal of this Chill uses inductive logic programming (Muggleton research is to automate lexicon construction for an in- 1992; Lavra˘c & D˘zeroski 1994) to learn a determin- tegrated NLP system that acquires both semantic lexi- istic shift-reduce parser written in Prolog. The input cons and parsers for natural-language interfaces from a to Chill is a corpus of sentences paired with semantic single training set of annotated sentences. representations, the same input required by Wolfie. Although a few others (Siskind 1996; Hastings & The parser learned is capable of mapping the sentences Lytinen 1994; Brent 1991) have presented systems for into their correct representations, as well as generalizing learning information about lexical semantics, this work well to novel sentences. In this paper, we limit our dis- is unique in combining several features. First, inter- cussion to acquiring parsers that map natural-language action with a system, Chill (Zelle & Mooney 1996), questions directly into Prolog queries that can be ex- that learns to parse sentences into semantic representa- ecuted to produce an answer (Zelle & Mooney 1996). tions is demonstrated. Second, it uses a fairly straight- Following are two sample queries for a database on forward batch, greedy learning algorithm that is fast U.S. geography, paired with their corresponding Pro- and accurate. Third, it is easily extendible to new rep- log query: resentation formalisms. Fourth, no prior knowledge is What is the capital of the state with the biggest Copyright c 1999, American Association for Artificial population? Intelligence (www.aaai.org). All rights reserved. answer(C, (capital(S,C), largest(P, <Sentence, Representation> For each phrase, p (of at most two words): 1.1) Collect the training examples in which p appears Training Examples WOLFIE 1.2) Calculate LICS from (sampled) pairs of these examples’ representations 1.3) For each l in the LICS, add (p, l) to the set of candidate lexicon entries Until the input representations are covered, or there are no remaining candidate lexicon entries do: CHILL Lexicon 2.1) Add the best (phrase, meaning) pair to the lexicon 2.2) Update candidate meanings of phrases occurring <Phrase, Meaning> in the same sentences as the phrase just learned Return the lexicon of learned (phrase, meaning) pairs. Figure 2: Wolfie Algorithm Overview Final Parser Prolog and size of the learned lexicon, since this should im- prove accuracy and ease parser acquisition. Note that Figure 1: The Integrated System this notion of semantic lexicon acquisition is distinct from learning selectional restrictions (Manning 1993; Brent 1991) or clusters of semantically similar words (state(S), population(S,P))))). (Riloff & Sheperd 1997). What state is Texarkana located in? Note that we allow phrases to have multiple mean- answer(S, (state(S), ings (homonymy) and multiple phrases to have the same eq(C,cityid(texarkana, )), loc(C,S))). meaning (synonymy). Also, some phrases may have a null meaning. We make only a few fairly straight- Chill treats parser induction as a problem of learn- forward assumptions. First is compositionality: the ing rules to control the actions of a shift-reduce parser. meaning of a sentence is composed from the mean- During parsing, the current context is maintained in ings of phrases in that sentence. Since we allow multi- a stack and a buffer containing the remaining input. word phrases in the lexicon (e.g., ([kick the bucket], When parsing is complete, the stack contains the rep- die( ))), this assumption seems fairly unproblematic. resentation of the input sentence. There are three types Second, we assume each component of the represen- of operators used to construct logical queries. One is tation is due to the meaning of exactly one word or the introduction onto the stack of a predicate needed in phrase in the sentence, and not more than one or to the sentence representation due to the appearance of a an external source such as noise. Third, we assume the phrase at the front of the input buffer. A second type meaning for each word in a sentence appears at most of operator unifies variables appearing in stack items. once in the sentence’s representation. Finally, we as- Finally, a stack item may be embedded as an argument sume that a phrase’s meaning is a connected subgraph of another stack item. The introduction operators re- of a sentence’s representation, not a more distributed quire a semantic lexicon as background knowledge. By representation. The second and third assumptions are using Wolfie, the lexicon can be provided automati- preliminary, and we are exploring methods for relaxing cally. Figure 1 illustrates the complete system. them. If any of these assumptions are violated, Wolfie may not learn a covering lexicon; however, the system Problem Definition can still be run and produce a potentially useful result. A semantic lexicon learner is presented with a set of sentences, each consisting of an ordered list of words The Wolfie Algorithm and an Example and annotated with a semantic representation in the The Wolfie algorithm outlined in Figure 2 has been form of a labeled tree; the goal is to find a seman- implemented to handle two kinds of semantic represen- tic lexicon consistent with this data. Such a lexicon tations: a case-role form based on conceptual depen- consists of (phrase, meaning) pairs (e.g., ([biggest], dency (Schank 1975) and a logical query language il- largest( , ))), where the phrases and their mean- lustrated above. The current paper will focus on the latter; the changes required for the former are minimal. ings are extracted from the input sentences and their In order to limit search, a form of greedy set cover- representations, respectively, such that each sentence’s ing is used to find a covering lexicon. The first step is representation can be composed from a set of com- to derive an initial set of candidate meanings for each ponents each chosen from the possible meanings of possible phrase. The current implementation is limited a phrase appearing in the sentence. Such a lexi- to one and two word phrases, but easily extended to con is said to cover the corpus. We will also re- longer phrases with a linear increase in complexity. For fer to the coverage of components of a representa- example, consider the following corpus: tion (or sentence/representation pair) by a lexicon en- 1. What is the capital of the state with the biggest try. Ideally, the goal is to minimize the ambiguity population? answer(C, (capital(S,C), only one sentence (e.g., [located]), the entire sentence largest(P, (state(S), population(S,P))))). representation is used as an initial candidate meaning. 2. What is the highest point of the state with the biggest Such candidates are typically generalized in step 2.2 to area? only the correct portion of the representation before answer(P, (high point(S,P), they are added to the lexicon.

Load more