The Natural Language Sourcebook
Total Page:16
File Type:pdf, Size:1020Kb
National Center for Research on Evaluation, C R E S S T Standards, and Student Testing Technical Report You can view this document on your screen or print a copy. UCLA Center for the Study of Evaluation in collaboration with: University of Colorado NORC, University of Chicago LRDC, University of Pittsburgh The RAND Corporation NATURAL LANGUAGE SOURCEBOOK Walter Read, Michael Dyer, Eva Baker, Patricia Mutch Frances Butler, Alex Quilici, John Reeves Center for Technology Assessment Center for the Study of Evaluation and Computer Science Department UCLA 1990 This overall project was directed by Eva L. Baker, funded by DARPA and administered by ONR, Contract Number N00014-86-K- 0395. We wish to thank the following people for assistance Craig Fields, MCC, Susan Chipman, ONR, Steve Kaisler, formerly of DARPA, Bob Simpson and Charles Wayne of DARPA, and, at UCLA, Sherry Miranda and Richard Seligman of the Office of Contracts and Grants Administration. In addition, the authors would like to thank the following people for their contributions at various stages in the development of the Natural Language Sourcebook: Kathleen Brennan, Rory Constancio, Cheryl Fantuzzi, Jana Good, Dayle Hartnett, Jill Fain Lehman, Elaine Lindheim, Mark Neder, and Jean Turner. Finally, Jaime Carbonell, Martha Evens, Evelyn Hatch, David Kieras, Carol Lord, and Merlin Wittrock served as consultants for the Sourcebook. Their suggestions were invaluable to the authors. © 1990 by the Regents of the University of California Table of Contents Introduction .............................................. 1 Sourcebook classification ................................. 13 Group I: Single-utterance issues ......................... 14 Identification of syntactic units .................... 15 Ambiguity ............................................ 23 Modifier attachment .................................. 50 Reference ............................................ 61 Metaphor and novel language .......................... 82 Other ................................................ 94 Group II: Connected utterance issues ..................... 112 Anaphora ............................................. 113 Ellipsis ............................................. 132 Integrating complex information ...................... 138 Reasoning, argumentation, story understanding ........ 144 Metaphor ............................................. 174 Group III: True-dialogue issues .......................... 177 User goals and plans ................................. 178 Logical presuppositions .............................. 195 Speech acts .......................................... 198 Meta-linguistic discourse ............................ 201 Group IV: Ill-formed input ............................... 203 Mistakes ............................................. 204 Non-standard input ................................... 212 Cross-indexing of Sourcebook exemplars .................... 221 Linguistic classification ............................ 224 Cognitive-psychological classification ............... 230 Glossary .................................................. 243 References ................................................ 260 Author Index .............................................. 269 xcv Sourcebook Introduction This document, the Natural Language Sourcebook, represents an attempt to provide researchers and users of natural language systems with a classification scheme to describe problems addressed by such systems. It is part of a much larger research effort conducted at the UCLA Center for Technology Assessment to explore alternative methods for evaluating high technology systems. In this project, Artificial Intelligence Measurement System (AIMS), the strategy was to explore the extent to which multidisciplinary methods, from the fields of artificial intelligence, education, linguistics, psychology, anthropology, and psychometrics, could be combined to provide better characterizations of the scope, quality, and effectiveness of AI systems. The areas of focus across the entire project have included: expert system shells, expert systems, natural language interfaces, including front-ends to databases and expert systems,text understanding systems and vision systems. The overall strategy of the project was to determine first the scope of implementations in the field, either through a classification system or compendium describing the types of efforts in a field, then to determine strategies for assessing the effectiveness of particular implementations. The goal of the project was first to test the concept that such systems can be evaluated and second to determine alternative approaches for establishing benchmarks of performance. This approach is a rather radical departure from typical "evaluation" approaches in the AI field. These are more 1 focused on the assessment of a particular strategy in terms of its theory and are less focused on attempting to develop standard metrics against which systems can be measured. Clearly, the matter of standardization was a critical issue in our studies, especially since most efforts emphasize fairly well specified domains. The Sourcebook compilation represents one strategy for attempting to "standardize" discussion of natural language systems, a strategy that attempts as well to represent the diversity of particular systems. The Sourcebook should help researchers and users sharpen their understanding of claims made on behalf of existing systems or to identify gaps for future research and development. The focus of the Sourcebook is the types of problems purportedly addressed, or more colloquially, "handled" by natural language systems. For example, the system may purport to "handle ellipsis" or "resolve anaphoric reference," but what does this claim actually mean? Does it mean the system can deal with all cases or just certain types? What classification of "types" of ellipsis or anaphora is the author of the system using? Without a common classification of the problems in natural language understanding, authors have no easy way to specify clearly what their systems do; potential users have trouble comparing different systems, and researchers struggle to judge the advantages or disadvantages of different approaches to developing systems. While these problems have been noted over the last ten years (Woods, 1977; Tennant, 1979), research developing specific criteria for evaluation of natural language systems has appeared only recently (Guida and Mauri, 1984, 1986). The 2 Sourcebook presents a classification and identification by example of problems in natural language understanding. Identification by example removes confusion over general claims, such as the claim that a system has the ability to "handle ellipsis." Although the Sourcebook in its present form does not exhaustively cover the range of problems in natural language understanding, it could be extended to do so. Previous Work on Natural Language Evaluation What the field of natural language calls evaluation largely focuses on descriptive characteristics of systems rather than on methods to determine the effectiveness of system performance. Such a perspective certainly conflicts with social science approaches to evaluation—approaches which call for relatively complex indicators of context, goals, processes, and outcomes in evaluating innovations of various types. The evaluation strategies described by writers working from within the AI perspective emphasize evaluating the quality of the research rather than the performance of particular implementations. Within that constraint, Woods (1977) discusses a number of dimensions along which the complexity and responsiveness of natural language systems can be measured. In particular, three approaches he considers are the following: (a) a "taxonomy of linguistic phenomena," (b) the convenience and perspicuity of the model used, and (c) processing time and size of code. He draws attention to the great effort involved in doing evaluation by any of these 3 methods and to the importance of a "detailed case-by-case analysis." As Woods points out, the difficulty of a taxonomic approach is that the taxonomy will always be incomplete. Any particular phenomenon will have many subclasses, and it is often the case that the published examples cover only a small part of the problem. A system might claim to resolve pronoun reference in general; however, the examples might be limited to pronoun reference in parallel constructions. To make any taxonomy useful, as many clear subclasses as possible must be identified. The taxonomy should serve, not only as a description of what has been achieved, but as a guide to what still needs to be accomplished. Tennant and others (Tennant, 1979; Finin, Goodman and Tennant, 1979) make a distinction between the conceptual coverage and the linguistic coverage of a natural language system and argue that systems have to be measured on each of these dimensions. Conceptual coverage refers to the range of concepts properly confronted by the system and linguistic coverage to the range of language used to discuss the concepts. Tennant (1979) suggests a possible experimental separation between conceptual and linguistic coverage. The distinction these authors make is important and useful, in part for emphasizing the significance of the knowledge base for usability of a natural language system. But the examples that Tennant (1979) gives for conceptual completeness (presupposition, reference to discourse objects) seem to be part of a continuum with