Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis Claire Cardie University of Massachusetts - Amherst

University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Computer Science Series 1994 Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis Claire Cardie University of Massachusetts - Amherst Follow this and additional works at: https://scholarworks.umass.edu/cs_faculty_pubs Part of the Computer Sciences Commons Recommended Citation Cardie, Claire, "Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis" (1994). Computer Science Department Faculty Publication Series. 60. Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/60 This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Computer Science Department Faculty Publication Series by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. DOMAIN-SPECIFIC KNOWLEDGE ACQUISITION FOR CONCEPTUAL SENTENCE ANALYSIS A Dissertation Presented by CLAIRE CARDIE Submitted to the Graduate School of the University of Massachusetts Amherst in partial ful®llment of the requirements for the degree of DOCTOR OF PHILOSOPHY September 1994 Department of Computer Science c Copyright by CLAIRE CARDIE 1994 All Rights Reserved DOMAIN-SPECIFIC KNOWLEDGE ACQUISITION FOR CONCEPTUAL SENTENCE ANALYSIS A Dissertation Presented by CLAIRE CARDIE Approved as to style and content by: Wendy G. Lehnert, Chair Edwina L. Rissland, Member Paul E. Utgoff, Member Lyn Frazier, Member W. Richards Adrion, Department Chair Computer Science ACKNOWLEDGMENTS There are many people to whom I am grateful and without whom the thesis would have been almost impossible to write (much less ®nish): First, Wendy Lehnert provided much inspiration both through her astounding energy and her insightful and informed views of natural language processing and arti®cial intelligence. I would also like to thank the other members of my dissertation committee for their help and guidance. Edwina Rissland helped me to view my work from a different perspective and (possibly unknowingly) offered encouragement and unbiased advice at critical points in my graduate school career. Paul Utgoff asked probing questions regarding some of the underlying assumptions of the work and left no ªwhichº or ªthatº unturned. Illuminating discussions with Lyn Frazier improved the thesis tremendously. Special thanks also to Priscilla Coe (who actually volunteered to be on the committee). I am also indebted to Bob Futrelle (Northeastern University). His energetic and enthusiastic support of my work helped to increase my con®dence in my own abilities. Thanks too to my of®cemates in A249 for putting up with the humor vortex that hovered over Ren and that threatened the entire of®ce: Jody Daniels, Fang-fang Feng, David Fisher, Timur Friedman, Joe McCarthy, Jonathan Peterson, and Stephen Soderland. Ellen ªBettyº Riloff is missing from the above list only because she deserves a line of her own. Someday we'll really have to put together that slide show... For countless softball games and practices, bicycle rides from hell, ice hockey on Cranberry pond, year-round volleyball, food snob dinners, ®reworks, and all kinds of fun, I thank: Westy, Teri, Brian, Josh, Alan, Hildum, Jack, Keith, Bart, Dann, Jody, Ruth, Zack, Alice, Steve, Glo, Molly, Tommy, Kevin, Christy, Clarence, and Cleo. I hope I haven't forgotten anyone. I would also like to acknowledge my parents for their support; my brothers and sisters Ð Bobby, David, Paul, Joanne, Meg, Peter, and Suzanne Ð for their lighthearted encouragement; and especially Aunt Claire, for always being so nice. Most of all, I would like to thank David Bingham Skalak. This research was supported by the National Science Foundation, the Advanced Re- search Projects Agency of the Department of Defense, and the Of®ce of Naval Research. iv ABSTRACT DOMAIN-SPECIFIC KNOWLEDGE ACQUISITION FOR CONCEPTUAL SENTENCE ANALYSIS SEPTEMBER 1994 CLAIRE CARDIE, B.S., YALE UNIVERSITY M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Wendy G. Lehnert The availability of on-line corpora is rapidly changing the ®eld of natural language processing (NLP) from one dominated by theoretical models of often very speci®c lin- guistic phenomena to one guided by computational models that simultaneously account for a wide variety of phenomena that occur in real-world text. Thus far, among the best-performing and most robust systems for reading and summarizing large amounts of real-world text are knowledge-based natural language systems. These systems rely heavily on domain-speci®c, handcrafted knowledge to handle the myriad syntactic, semantic, and pragmatic ambiguities that pervade virtually all aspects of sentence analysis. Not surprisingly, however, generating this knowledge for new domains is time-consuming, dif®cult, and error-prone, and requires the expertise of computational linguists familiar with the underlying NLP system. This thesis presents Kenmore, a general framework for domain-speci®c knowledge acquisition for conceptual sentence analysis. To ease the acquisition of knowledge in new domains, Kenmore exploits an on-line corpus using symbolic machine learning techniques and robust sentence analysis while requiring only minimal human intervention. Unlike most approaches to knowledge acquisition for natural language systems, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. The thesis presents the results of using Kenmore with corpora from two real-world domains (1) to perform part-of-speech tagging, semantic feature tagging, and concept tagging of all open-class words in the corpus; (2) to acquire heuristics for part-of- speech disambiguation, semantic feature disambiguation, and concept activation; and (3) to ®nd the antecedents of relative pronouns. v TABLE OF CONTENTS Page ACKNOWLEDGMENTS ::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: : iv ABSTRACT :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: v LIST OF TABLES :: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :: x LIST OF FIGURES :: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :: xiii CHAPTER 1. INTRODUCTION : ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: :::: :::: ::: : 1 1.1 Ambiguity Resolution in Language Understanding ::: ::: :: ::: :: 3 1.1.1 Part-of-Speech Ambiguity : ::: ::: ::: ::: ::: :: ::: :: 3 1.1.2 Word Sense Ambiguity : :: ::: ::: ::: ::: ::: :: ::: :: 3 1.1.3 Dealing with Unknown Words : ::: ::: ::: ::: :: ::: :: 4 1.1.4 Prepositional Phrase Attachment ::: ::: ::: ::: :: ::: :: 4 1.1.5 Pronoun Resolution :: :: ::: ::: ::: ::: ::: :: ::: :: 5 1.1.6 Understanding Conjunctions and Appositives :: ::: :: ::: :: 6 1.1.7 Uniform Treatment of Ambiguity ::: ::: ::: ::: :: ::: :: 6 1.2 A Framework for Knowledge Acquisition :: ::: ::: ::: :: ::: :: 7 1.3 An Example :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 9 1.4 Advantages of the Framework : :: ::: ::: ::: ::: ::: :: ::: :: 11 1.5 Claims and Contributions of the Thesis : ::: ::: ::: ::: :: ::: :: 13 1.6 What's to Follow :: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 14 2. RELATED WORK :::::: :::: ::: :::: :::: :::: ::: :::: :::: ::: :::: :::: :: 16 2.1 Hand-crafted Knowledge Acquisition : ::: ::: ::: ::: :: ::: :: 16 2.2 Intelligent Interfaces ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 17 2.3 Statistical Approaches :: ::: :: ::: ::: ::: ::: ::: :: ::: :: 18 2.3.1 Acquisition of lexical knowledge ::: ::: ::: ::: :: ::: :: 18 2.3.2 Acquisition of structural knowledge : ::: ::: ::: :: ::: :: 21 2.3.3 Comprehensive approaches to ambiguity resolution : :: ::: :: 22 2.4 Knowledge-Based Approaches :: ::: ::: ::: ::: ::: :: ::: :: 23 2.5 Connectionist Approaches ::: :: ::: ::: ::: ::: ::: :: ::: :: 23 2.5.1 Distributed Approaches :: ::: ::: ::: ::: ::: :: ::: :: 23 2.5.2 Localist Approaches :: :: ::: ::: ::: ::: ::: :: ::: :: 24 2.6 Machine Learning Approaches :: ::: ::: ::: ::: ::: :: ::: :: 24 2.6.1 A Comparison with SOAR : ::: ::: ::: ::: ::: :: ::: :: 25 2.7 Case-Based Approaches : ::: :: ::: ::: ::: ::: ::: :: ::: :: 25 2.8 Summary ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 28 vi 3. KNOWLEDGE ACQUISITION FOR CONCEPTUAL SENTENCE ANALYSIS :::: :::: ::: :: 30 3.1 The CIRCUS Conceptual Sentence Analyzer : ::: ::: ::: :: ::: :: 30 3.1.1 Syntactic Processing in CIRCUS : ::: ::: ::: ::: :: ::: :: 31 3.1.2 Predictive Semantics in CIRCUS ::: ::: ::: ::: :: ::: :: 33 3.1.3 Handling Embedded Clauses in CIRCUS : ::: ::: :: ::: :: 35 3.1.3.1 Understanding Wh-Constructions ::: ::: :: ::: :: 36 3.1.3.2 LICK Caveats and Conclusions :: ::: ::: :: ::: :: 38 3.2 Examples of ªContextº in CIRCUS ::: ::: ::: ::: ::: :: ::: :: 39 3.3 Knowledge Needed for Conceptual Analysis ::: ::: ::: :: ::: :: 42 3.3.1 Syntactic Processing :: :: ::: ::: ::: ::: ::: :: ::: :: 42 3.3.2 Predictive Semantics Module :: ::: ::: ::: ::: :: ::: :: 43 3.3.3 LICK Processing : ::: :: ::: ::: ::: ::: ::: :: ::: :: 43 3.3.4 Knowledge Acquisition Tasks for Kenmore ::: ::: :: ::: :: 44 4. DOMAIN-SPECIFIC KNOWLEDGE ACQUISITION : ::: :::: :::: ::: :::: :::: ::: :: 45 4.1 The Task : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: ::: :: 45 4.2 Constraints on the Knowledge Acquisition Process ::: ::: :: ::: :: 46 4.3 The MUC and TIPSTER Performance Evaluations : ::: ::: :: ::: :: 47 4.3.1 MUC Corpus of Latin American Terrorism

Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis Claire Cardie University of Massachusetts - Amherst

Theory-Based Knowledge Acquisition for Ontology Development

Propositional Knowledge: Acquisition and Application to Syntactic and Semantic Parsing

Chapter 6 - Expert Systems and Knowledge Acquisition

A Knowledge Acquisition Assistant for the Expert System Shell Nexpert-Object

Knowledge Acquisition in a System

Medical Knowledge Acquisition Using Biomedical

Knowledge Acquisition Framework from Unstructured Biomedical Knowledge Sources Demeke Asres Ayele

Knowledge Acquisition, Representation, and Reasoning

A Brief French History of Knowledge Acquisition Nathalie Aussenac-Gilles, Fabien Gandon

Knowledge Acquisition for Knowledge-Based Systems: Notes on the State-Of-The-Art

A Survey on Domain Knowledge Representation with Frames

Information Integration for Concurrent Engineering (Iice) Idef3 Process Description Capture Method Report