<<

NL Assistant: A Toolkit for Developing Natural Language

Applications Deborah A. Dahl, Lewis M. Norton, Ahmed Bouzid, and Li Li

Unisys Corporation

Introduction scale lexical resources have been integrated with the toolkit. These include Comlex and WordNet We will be demonstrating a toolkit for as well as additional, internally developed developing natural language-based applications resources. The second strategy is to provide easy and two applications. The goals of this toolkit to use editors for entering linguistic information. are to reduce development time and cost for natural language based applications by reducing Servers the amount of linguistic and programming work needed. Linguistic work has been reduced by Lexical information is supplied by four external integrating large-scale linguistics resources--- servers which are accessed by the natural Comlex (Grishman, et. al., 1993) and WordNet language engine during processing. Syntactic (Miller, 1990). Programming work is reduced by information is supplied by a lexical server based automating some of the programming tasks. The on the 50K Comlex available toolkit is designed for both speech- and text- from the Linguistic Data Consortium. Semantic based interface applications. It runs in a information comes from two servers, a KB Windows NT environment. Applications can server based on the portion of WordNet run in either Windows NT or Unix. (70K concepts), and a semantics server containing case frame information for 2500 System Components English . A denotations server using unique concept names generated for each The NL Assistant toolkit consists of WordNet synset at ISI links the in the to the concepts in the KB. When the 1. a natural language processing engine (Dahl, engine is connected to the servers, whenever 1992) lexical information for specific words cannot be 2. lexical and semantic servers based on found in the local engine, it is requested from Comlex and WordNet (Grishman et.al, the servers. When the servers cannot supply 1993, Miller, 1990) lexical information for a particular word, various heuristics are used to hypothesize the 3. template flies which serve as the basis for missing information. When hypothesized new applications information is wrong, it can be corrected by using the editors. 4. a graphical toolkit for entering linguistic and application-related information Editors 5. algorithms for automatic rule generation Although the servers minimize the amount of based on developer input. linguistic work that needs to be done to develop an application, they do not eliminate it. The Natural Language Development main reason for this is that a particular Two strategies address the goal of minimizing application will make use of words that do not the amount of linguistics expertise required to exist in the dictionary. For example, in our develop applications with the NL Assistant initial applications these have included words toolkit. To reduce the amount of lexical such as the verbs 'OEM', 'interface', information that the developer must add, large- 'download', and 'customize'. To improve the

33 ease of use of the linguistic development editors for lexical, semantic, and knowledge- environment, several special-purpose editors base work. have also been implemented. These include The figure below illustrates an editing session in the semantic rule editor for the advise.

Application Development Metrics Linking the application-independent semantic The largest application we have developed to representation to the back-end application date has 37 answer classes on the topic of the is the task of the application module. NL Assistant product. It used 683 training To reduce the amount of time required to sentences and achieved a score of 83% first develop an application and the amount of answer correct and 88% first or second answer expertise required, we have structured the correct on a live test with 144 queries. It application module into a set of several different required approximately two person-months to types of rules. (Norton, et. al., 1996) The tasks develop. these rules have to perform are: (1) map the user's utterance into an utterance class which References consists of pragmatically equivalent utterances. (2) determine, based on the user's utterance, the Dahl, Deborah A. "'Pundit--Natural language state of the dialog, and any other information interfaces". In G. Comyn, N.E. Fuchs, and M. relevant to the application (such as the state of a J. Ratcliffe, eds. Logic programming in action. ) what to do next (3) perform the next Springer-Verlag, 1992. action or set of actions. Actions include saying Grishman, Ralph, Catherine Macleod and something to the user, retrieving information Susanne Wolf. "The Comlex syntax project". from a database, and resetting the dialog to a ARPA Human Language Technology new state. These rules are written in . The Workshop. Morgan Kaufmann, 1993. toolkit provides an editor for editing and managing these rules. In addition, the toolkit Linebarger, Marcia C., Lewis M. Norton, and provides tools for automatically generating rules Deborah A. Dahl. "'A Portable approach to last in the special case of applications which do not resort and interpretation". ARPA control a dialog. Human Language Technology Workshop, 1993.

Miller, George A. "'Five papers on WordNet". Applications International Journal of . 1990. We will demonstrate a web-based text application on the topic of the NL Assistant Norton, Lewis M., Carl E. Weir, Ahmed product. We will also demonstrate a speech Bouzid, Deborah A. Dahl, and K.W. Scholz. "'A recognition application for mortgage quotations, methodology or application development for our Mortgage Assistant product. spoken language systems". Proceedings of ICSLP96. Philadelphia, PA, 1996.

34