Cyc: a Midterm Report
Total Page:16
File Type:pdf, Size:1020Kb
AI Magazine Volume 11 Number 3 (1990) (© AAAI) Articles The majority of work After explicating the need for a large common- We have come a in knowledge repre- sense knowledge base spanning human consen- long way in this . an sentation has dealt sus knowledge, we report on many of the lessons time, and this article aversion to with the technicali- learned over the first five years of attempting its presents some of the ties of relating predi- construction. We have come a long way in terms lessons learned and a addressing cate calculus to of methodology, representation language, tech- description of where the problems niques for efficient inferencing, the ontology of other formalisms we are and briefly the knowledge base, and the environment and that arise in and with the details infrastructure in which the knowledge base is discusses our plans of various schemes being built. We describe the evolution of Cyc for the coming five actually for default reason- and its current state and close with a look at our years. We chose to representing ing. There has almost plans and expectations for the coming five years, focus on technical been an aversion to including an argument for how and why the issues in representa- large bodies addressing the prob- project might conclude at the end of this time. tion, inference, and of knowledge lems that arise in ontology rather than actually represent- infrastructure issues with content. ing large bodies of knowledge with content. such as user interfaces, the training of knowl- However, deep, important issues must be edge enterers, or existing collaborations and addressed if we are to ever have a large intelli- applications of Cyc. gent knowledge-based program: What onto- logical categories would make up an adequate The Evolution of the set for carving up the universe? How are they related? What are the important facts and Cyc Methodology heuristics most humans today know about For two decades, AI research has been polar- solid objects? And so on. In short, we must ized into neats and scruffies (roughly corre- bite the bullet. sponding to theoretical versus experimental We don’t believe there is any shortcut to approaches). After an initial strongly scruffy being intelligent, any yet-to-be-discovered approach, we seem to have settled on a Maxwell’s equations of thought, any AI Risc middle ground that combines the insights architecture that will yield vast amounts of and power of each. problem-solving power. Although issues such On the one hand, we realized that a number as architecture are important, no powerful of mistakes made in the project’s initial years formalism can obviate the need for a lot of would have been avoided by a more formal knowledge. approach (especially in regard to the con- By knowledge, we don’t just mean dry, struction of the representation language). We almanac-like or highly domain-specific facts. also realized that philosophy had a lot to con- Rather, most of what we need to know to get tribute, especially when it came to deciding by in the real world is prescientific (knowl- on issues of ontology (Quine 1969). edge that is too commonsensical to be includ- On the other hand, however, there are a ed in reference books; for example, animals number of areas where we found the empiri- live for a single solid interval of time, nothing cal approach more fruitful. The areas are typi- can be in two places at once, animals don’t cally still open research issues for the formalists like pain), dynamic (scripts and rules of thumb or have not even been addressed by them, for for solving problems) and metaknowledge example, codifying the most fundamental (how to fill in gaps in the knowledge base, types of goals that people have. how to keep it organized, how to monitor Therefore, our approach is to largely carry and switch among problem-solving methods, out empirical research and be driven by look- and so on). ing at lots of examples but to keep this work Perhaps the hardest truth to face, one that supported on a strong theoretical foundation. AI has been trying to wriggle out of for 34 Further, we have been driven to adopt a kind years, is that there is probably no elegant, of tool-kit orientation: Assemble a collection effortless way to obtain this immense knowl- of partial solutions to the various difficult edge base. Rather, the bulk of the effort must problems Cyc has to handle, and add new (at least initially) be manual entry of assertion tools as required. That is, for a number of after assertion. problems (time, causality, inference, user Half a decade ago, we introduced (Lenat, interface, and so on), there aren’t any known Prakash, and Shepherd 1986) our research general-purpose, simple, efficient solutions, plans for Cyc, a decade-long, two person-cen- but we can make do with a set of modules tury effort we had recently begun at MCC to that enable us to easily handle the most manually construct such a knowledge base. common cases. Copyright ©1990 AAAI. All rights reserved. 0738-4602/90/$4.00 FALL 1990 33 Articles The bulk of the effort is currently devoted kept modifying and tweaking such mecha- to identifying, formalizing, and entering nisms, and often, this method forced us to go microtheories of various topics (such as shop- back and redo parts of the knowledge base so ping, containers, emotions). We follow a pro- that they corresponded to the new way the cess that begins with a statement, in English, inference engine worked. As the size of the of the microtheory. On the way to our goal, knowledge base increased, this process an axiomatization of the microtheory, we became intolerable. We came to realize that identify and make precise those Cyc concepts having a clean semantics for the knowledge necessary to state the knowledge in axiomatic base was vital, declaratively expressing the form. To test that the topic has been ade- meaning of inheritance, TheSetOf, default quately covered, stories that deal with the rules, automatic classification, and so on, so topic are represented in Cyc; we then pose that we wouldn’t have to change the knowl- questions that any reader ought to be able to edge base when we altered the implementa- answer after having read the story. tion of one of the mechanisms. One of the unfortunate myths about Cyc is As late as 1987, the only inferencing in Cyc that its aim is to be a sort of electronic ency- was done using these few mechanisms: inher- clopedia. We hope that this article lays this itance along instances (IS-A) links, rigid misconception to rest. If anything, Cyc is the toCompute definitions of one slot in terms of complement of an encyclopedia. The aim is others plus the running of demons (opaque that one day Cyc ought to contain enough lumps of Lisp code) and expert system–like commonsense knowledge to support natural production rules. The results were inefficien- language understanding capabilities that cies (because of the overuse of the most gen- enable it to read through and assimilate any eral mechanisms), abstraction breaking (often encyclopedia article, that is, to be able to resorting to raw Lisp code escapes), and inad- answer the sorts of questions that you or I equacies (for example, given a rule “If A Then could after having just read any article, ques- B,” and ¬B, Cyc couldn’t conclude ¬A.) tions that neither you nor I nor Cyc could be For efficiency’s sake, we developed dozens expected to answer beforehand. of specialized inference procedures, with spe- Our hope and expectation is that around cial truth maintenance system–related (TMS- the mid-1990s, we can transition more and related) bookkeeping facilities for each (Doyle more from manual entry of assertions to 1987). Then, to recoup usability, we devel- (semi-) automated entry by reading online oped a mechanical translator so that one can texts; the role of humans in the project now input general predicate calculus–like would transition from the brain surgeons to assertions, and Cyc can convert them from tutors, answering Cyc’s questions about the this epistemological level into the form difficult sentences and passages. This radical required by these efficient heuristic-level, spe- change is what it means for Cyc to have a cial-purpose mechanisms (see The Current decade-long projected lifespan. State of the Representation Language). Originally, Cyc handled defaults in an ad hoc and frequently inadequate way. In the The Evolution of the last two years, we have moved to a powerful Representation Language and principled way of handling them. As we discuss in the section Epistemological Level CycL is the language in which the Cyc knowl- and Default Reasoning, Cyc constructs and edge base is encoded. In 1984, our representa- compares arguments for and against a propo- tion was little more than frames. Although a sition, using explicit rules to decide when an significant fraction of knowledge can be con- argument is invalid or when one argument is veniently handled using just frames, this to be preferred over another. approach soon proved awkward or downright Early on, we allowed each assertion in the inadequate for expressing various assertions knowledge base to have a numeric certainty we wanted to make: disjunctions, inequalities, factor (cf), but this approach led to its own existentially quantified statements, metalevel set of increasingly severe difficulties. For propositions about sentences, and so on. At example, one knowledge enterer might assert least occasionally, therefore, we required a A and assert B and assign them cfs of 95 and framework of greater expressive power.