Toward Memory-Based Reasoning
Total Page:16
File Type:pdf, Size:1020Kb
SPECIAL ISSUE Articles TOWARD MEMORY-BASED REASONING The intensive use of memory to recall specific episodes from the past rather than rules-should be the foundation of machine reasoning. CRAIG STANFILL and DAVID WALTZ The traditional assumption in artificial intelligence reasoning warrants further experimentation. Two (AI) is that most expert knowledge ·is encoded in the bodies of evidence have led us to support this form of rules. We consider the phenomenon of rea hypothesis. The first is the study of cognition: It is soning from memories of specific episodes, however, difficult to conceive of thought without memory. to be the foundation of an intelligent system, rather The second is the inability of AI to achieve success than an adj unct to some other reasoning method. in any broad domain or to effectively capture the This theory contrasts with much of the current notion of "common sense," much of which, we be work in similarity-based learning, which tacitly lieve, is based on essentially undigested memories of assumes that learning is equivalent to the automatic past experience. generation of rules, and differs from work on We will primarily discuss the application of our "explanation-based" and "case-based" reasoning in hypothesis to the limited case of similarity-based in that it does not depend on having a strong domain duction. The goal of similarity-based induction is to model. make decisions by looking for patterns in data. This With the development of new parallel architec .approach has been studied extensively in the con tures, specifically the Connection Machine@ system, text of "similarity-based learning," which uses ob the operations necessary to implement this approach served patterns in the data to create classification . to reasoning have become sufficiently fast to allow rules. The memory-based version of this method experimentation. This article describes MBRtalk, an eliminates the rules, solving the problem by direct experimental memory-based reasoning system that reference to memory. For example, given a large set has been implemented on the Connection Machine, of medical records, a similarity-based induction pro-· as well as the application of memory-based reason gram should be able to diagnose new patients [clas ing to other domains. sify them as having some known disease), using only the information contained in the database. The two THE MEMORY-BASED REASONING approaches differ in their use of the database: Rule HYPOTHESIS induction operates by inferring rules that reflect reg Although we do not reject the necessity of other ularities in the data, whereas memory-based reason forms of reasoning, including those that are cur ing works directly from the database. rently modeled by rules or by the induction of rules, we believe that the theory behind memory-based Primitive Operations of Memory-Based Reasoning The memory-based reasoning hypothesis has not The Connection Machine is a registered trademark of Thinking Machines Corporation. been extensively studied in the past because von Neumann machines do not support it well. © 1986 ACM 0001-0782/86/1200-1213 75¢ There have been systems, such as Samuel's Check- December 1986 Volume 29 Number 12 Communications of the ACM 1213 Special Issue ers player [38, 39], which worked from memory, but count the number of data items having both features they have required an exact match between an item A and B, and divide.2The counting operation can be in memory and the current situation, and a non done for every symptom-disease combination simul general search algorithm (e.g., hashing or indexing) taneously, in log2 n time. (For a discussion of the to retrieve items from memory. In the real world principles underlying such algorithms, see [12].) there may be no exact matches, so the "best match" The amount of hardware needed to build a Con is called for. Unfortunately, there is no general way nection Machine system is O(n log n) in the number to search memory for the best match without exam of processors.3 The time needed to perform memory ining every element of memory. On a von Neumann based reasoning is O(log2 n). Therefore, memory machine, this makes the use of large memories im based reasoning scales to arbitrarily large databases, practical: Experiments might take weeks to run. On as neither the quantity of hardware nor the time a fine-grained parallel machine, such as the Connec required for processing grows at a prohibitive rate. tion Machine system, this problem does not begin to arise until the size of the database exceeds the size Memory-Based Reasoning in More Detail of the machine's memory, which pushes the prob Although the basic idea of memory-based reasoning lem back three or four orders of magnitude. (For is simple, its implementation is somewhat more another application of the Connection Machine sys complex: We must separate important features from tem to a data-retrieval problem, see "Parallel Free unimportant ones; moreover, what is important is Text Search on the Connection Machine System" usually context-sensitive. One cannot assign a single [43] on page 1229 of this issue.) weight to each feature and expect it to hold true for The currently available Connection Machine sys all time. For example, suppose a patient reports ab tem [11] is a massively parallel SIMD computer with dominal pain, which happens to match both records 16 up to 65,536 (2 ) processing elements, each having of pregnant patients and patients with ulcers.· In 4,096 bits of memory and a 1-bit-wide ALU. 1 The the case of ulcers, gender is irrelevant in judging architecture can (in principle) be scaled upward to the similarity of the record to that of the patient, an arbitrarily large number of processors. Every whereas it is highly relevant in the case of preg processor executes the same program, though pro nancy. However, this fact can orily be known to the cessors can selectively ignore program steps. Pro system if it computes the statistical correlations be cessors are connected to each other via a hypercube tween gender and ulcers on one hand, and gender based routing network. On a Connection Machine and pregnancies on the other. Thus, we calinot as system with n processors, any processor can send a sign a single weight to gender ahead of time; instead, message to any other processor in log n time. it must be calculated with respect to the task at Accessing memory, in our current paradigm, re hand.4 Various schemes for accomplishing this will quires four basic operations. The first is counting the be presented below. number of times various features or combinations of The operation of memory-based reasoning systems features occur. For example, in the context of medi can be made clearer through another example. Sup cal applications, ifwe had a patient with a high pose that we have a large relational database of med fever, we would want to find out how often a high ical patient records, with such predictor features as . r fever occurred in combination with various diseases. "age," "sex," "symptoms," and "test results for each The second is using these feature counts to produce patient." Memory-based reasoning operates by pro a metric (a topic discussed in detail in the section on posing plausible values for goal features, such" "as the basic algorithm). The third step is calculating the "diagnosis," "treatment," and "outcome," which are dissimilarity between each item in memory and the known for the patients in the database, but are un current case (patient), and the last is retrieving the known for the patient being treated. A program best matches. The first operation-counting-will be can fill in the "diagnosis" slot for a new patient by described here briefly as an illustration of how the parallelism of the Connection Machine system is 2This method would be sufficient to support Bayesian reasoning [2,3,21] in utilized. real time, working directly from a database. One term in our metrics is the probability of a 3 We are counting the number of wires in the hypercube. A hypercube with 2t vertices (processors) will have l/:zk2t wires. Setting n = 2t , the number of particular "disease" A given some "symptom" B (the wires is 1f211 log2 II. The hardware needed to construct the processors is, of conditional probability A IB). To compute this, we course, linear in the number of processors. count the number of data items having feature B, <lOne can imagine saving weights for various situations rather than always calculating them on the fly, but this puts heavy demands on memory and requires another matching process between the current.!situation and 14 1 Machines with 2 and 2'5 processors are also available. prestored weights for a record. 1214 Communications of the ACM December 1986 Volume 29 Number 12 Special Issue (1) broadcasting to each processor the values for using these counts to produce a metric, using the each of the new patient's predictors; (2) having each metric to find the dissimilarity between the current processor compute a numerical "distance" and a nu problem and every item in memory, and retrieving merical "weight" for each predictor in its record as a the best matches. When implemented on the Con function of the new patient's value for that predic nection Machine system, this takes lot n time. Since tor;5 (3) having each processor use these weights and neither the hardware nor the processing time re distances to compute a total distance measure from quired grows at a prohibitive rate, memory-based each record to that of the new patient; and (4) select reasoning scales to arbitrarily large databases.