On the Effective Use of Cyc in a Question Answering System

On the Effective Use of Cyc in a Question Answering System Jon Curtis, Gavin Matthews, David Baxter Cycorp, Inc. 3721 Executive Center Drive, Suite 100, Austin, TX 78731. {jonc,gmatthew,baxter}@cyc.com Topics: flexibility supported by discourse/user model, new types of questions, reasoning with incom- plete knowledge, response generation. Abstract assertions) relating them3, Cyc is the only AI program in existence today that can reasonably claim to have some de- We describe a commercial question-answering sys- gree of common sense. Cyc’s knowledge is represented in tem that uses AI – specifically, the Cyc system – to CycL, a higher-order logical language based on predicate support passage retrieval and perform deductive calculus. Every assertion in Cyc is represented in a context, QA, to produce results superior to what each ques- or microtheory, which allows the representation of compet- tion-answering technique could produce alone. ing theories. Like ordinary concepts, microtheories are ex- plicitly represented as first-class objects in Cyc, giving Cyc 1 Introduction a reflective capability to reason about its own representa- This paper describes a working prototype of a commercial tions. Microtheories form a hierarchy that facilitates knowl- question-answering system that uses artificial intelligence in edge re-use (assertions stored in the most general contexts conjunction with NLP-driven passage retrieval in a way that are always available), and inferential focus (given a query integrates the two markedly different approaches to ques- posed in a specific microtheory, other knowledge from sib- tion-answering and produces better results than either ap- ling or more specific microtheories will not come into play). proach could yield alone.1 MySentient2 Answers 1.0 draws Cyc’s inference engine combines general theorem proving upon the Cyc system, a large knowledge-base, common- (e.g. rule chaining) with specialized reasoning (e.g. sub- sense reasoning system. We describe three areas in which sumption and transitivity). Cyc's knowledge contributes to the system's overall ques- Cyc has been used in commercial web-search systems tion-answering ability, both directly in deductive question- (e.g. HotBot) and in question-answering systems, most re- answering, and indirectly, by supporting passage-retrieval. cently in a purely deductive system for answering AP chem- In particular, we focus on the use of Cyc for: istry questions, developed in collaboration with Vulcan, Inc. 1. augmentation of NLP-based passage retrieval by gen- [Friedland et al., 2004]. Cyc’s rôle in the MySentient sys- erating NL expansions of key concepts mentioned in a tem heralds its first appearance in a commercial question- question; answering system. MySentient makes use of Cyc perva- 2. answering question types that pose a challenge to pas- sively, as a means to augment NLP-based QA, as the basis sage retrieval methods, such as procedural ("How do I for a deductive QA module, and in other capacities, such as …?") and cost/benefit ("Why should I …?"); and clarification and profiling, that will be touched upon here. 3. paraphrasing the results of deductive question answering as NL strings for display to an end user. 3 MySentient Answers 1.0 We close with a discussion of the current limitations of the MySentient Answers 1.0 is a working question-answering integrated system and a description of anticipated extensions system, designed by MySentient Ireland (R&D) Ltd. of to the use of Cyc in future versions. Dublin, Ireland, and implemented in collaboration with Cy- corp, Inc. of Austin, Texas, and the Center for Natural Lan- 2 Cyc guage Processing in Syracuse, New York4. MySentient Cyc is a state-of-the-art artificial intelligence program that Answers has been the subject of extensive demonstration to has been in development since 1984. Drawing upon the interested commercial parties and is expected to be available world’s largest general-purpose knowledge base of over for public access in the near future. 164,000 concepts and 3,300,000 facts (rules and ground 1 The work described in this paper was made possible by the 3 MySentient uses a carefully-chosen subset of the full Cyc financial backing of MySentient. knowledge base with 137,000 concepts and 1,700,000 facts. 2 “MySentient” is a registered trademark of MySentient Ireland 4 See http://www.mysentient.com/, http://www.cyc.com/, and (I.P.) Limited. http://www.cnlp.org/ 3.1 Architecture customer. The final goal of the project is that the customer’s domain experts will perform much of this speciali- MySentient Answers is based on the S-Core architecture, which integrates disparate components into a uniform XML- zation, using a suite of MySentient Authoring Tools. At the based interface. Components each receive a storybook giv- current stage of development, these tools are in a rudimen- tary state; therefore, much of the authoring done in support ing the history of the interaction, and their output is ap- pended to the appropriate element. This design gives some of the results described in this paper has been simulated by of the flexibility of a blackboard architecture, yet allows blending prototype tools with the intervention of skilled ontologists. This simulation not only provides a proof-of- some powerful simplifying assumptions. concept of the run-time question-answering system, but also permits an informed comparison to be made between the representation capabilities required and the feasible capabilities of the planned authoring tools (see Section 8, “Cur- rent Limitations and Future Directions,” below). The corpus used for this simulation was provided by the Motley Fool UK, based on its website.5 While coverage of a corpus must start with the corpus it- self, it is also necessary to concentrate on test queries; Cyc’s coverage of the Motley Fool domain therefore had two foci. The corpus was subjected to automated analysis for noun phrase identification and interpretation, plus extraction of glossary entries. A manual pass of review and correction to ensure broad coverage of the domain followed this. Known and blind question sets were prepared (by both Cycorp and MySentient) based on the corpus. The known question sets were analyzed by question type (see Question Types of Interest), and strategies for broad coverage of each question type were devised and implemented. Independently, the quality-assurance team performed daily tests based on the question sets. Results were evalu- ated on a primitive basis by automatic comparison with a growing set of input/output pairs. Each input/output pair was classified as correct, incorrect, or correct but badly paraphrased. Incorrect results were reviewed with a focus on identifying and resolving the broad class of defect (such as missing or erroneous knowledge) rather than fixing prob- lems specific to particular questions. The intensive ontological engineering effort for the Mot- ley Fool UK domain was performed over a four-week pe- Figure 1: MySentient Answers system architecture, riod, and took 691.25 person-hours. The source corpus was showing selected components. equivalent to about 200 pages of text and a total of 286 test questions were prepared for that domain. The NLP-based components also underwent a training process against the 3.2 Question Answering Modules Motley Fool corpus; however, this training was done inde- The S-Core architecture allows any task to be attempted in pendently of the simulated authoring/ontological engineer- parallel (or in sequence) by multiple competing modules. In ing effort done for Cyc. As a limited test of how MySen- MySentient Answers, there are several question-answering tient Answers benefits from integrating both NLP and de- modules: various NLP passage-retrieval modules developed ductive approaches to question-answering, MySentient pre- by CNLP, and a deductive question-answering module pared 132 questions that were posed to the system, and for based on Cyc. All QA modules return NL answers. each question, each QA system was scored on whether it CNLP’s question-answering capabilities are grounded in produced a satisfactory answer. In borderline cases, a half- a quasi-logical representation – Language-to-Logic, or point was awarded. “L2L” that has proven successful in recent TREC question Overall, the multiple CNLP QA modules scored 63% and answering tracks [Diekema, et al., 2000]. the Cyc DQA module scored 34%. This asymmetry is to be expected because of the relative maturity of NLP systems in 3.3 Methodology the QA domain. The federation of QA modules (taking the The Cyc Knowledge Base is essentially open-domain, but deployments of MySentient Answers will be focused on specific domains that reflect the needs and interests of the 5 “The Motley Fool” is a registered trademark of Motley Fool, Inc. See http://www.fool.co.uk/ for the corpus website. high score for each question) scored 79%, a significant im- specific lexicon to the general lexicon for that language. So provement over the individual QA modules. In several for British English, Cyc will generate a sub-context link cases, both modules gave usefully different answers that, from the user-specific lexicon to #$BritishEnglishMt, mak- taken together, form a rounded answer to the user’s ques- ing its data about British spellings, common words, etc., tion. Two interesting examples are: available. For more information on the representation and In response to “How do I protect myself from credit card use of lexicons in Cyc, see [Burns and Davis, 1999]. fraud?”

Load more