Lecture Notes in Computer Science

Edited by G. Goos and J. Hartmanis

232

Fundamentals of Artificial Intelligence An Advanced Course

Edited by W. Bibel and Ph. Jorrand

Prof. Dr. J. Stoer Instltut far Angewandte Mathematik und Statistik 87 WQrzburg,Am Hubland Springer-Verlag Berlin Heidelberg NewYork London Paris Tokyo Editorial Board D. Barstow W. Brauer R Brinch Hansen D. Gries D. Luckham C. Moler A. Pnueli G. SeegmiJller J. Stoer N. Wirth

Editors Wolfgang Bibel Institut fLir Informatik, Technische Universit~t Mtinchen Postfach 202420, D-8000 MiJnchen 2

Philippe Jorrand LIFIA-IMAG BP 68, F-38402 St. Martin d'H6res Cedex

CR Subject Classifications (1985): F.4.1, 1.2.3, 1.2.4, 1.2.6, D. 1.3

ISBN 3-540-16782-X Springer-Verlag Berlin Heidelberg New York ISBN 0-387-16782-X Springer-Verlag New York Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to "Verwertungsgesellschaft Wort", Munich. © Springer-Verlag Berlin Heidelberg 1986 Printed in Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2145/3140-543210 PREFACE

The expectations in Artificial Intelligence - or Intellectics - have never been as high as they are today. Clearly, they are too high given the current state of the art in this fascinating field. Although we have seen some remarkable systems performing extremely well, there is no doubt for those who understand how they work that this performance reflects just a beginning in our understanding of the fundamentals that might be required in systems to perform in a truly intelligent way.

One of the basic paradigms in Artificial Intelligence has always been that experiment needs to be complemented with theory, or vice versa. During a wave of experimenting throughout the world in AI we feel that some emphasis on the more theoretical side might be appropriate - and necessary for progress in this understanding of the fundamentals of AI.

This sort of reflection motivated us when we took the initiative in organizing the first Advanced Course in Artificial Intelligence that was held in Vignieu, France, in July 1985. Seven well-known AI researchers were asked to cover basic topics that might be of relevance for the fundamentals of our field. The present volume comprises the elaborated and harmon- ized versions of their lectures. Most of them have been written in the form of a tutorial, so that the book provides a most valuable guide into the more fundamental aspects of AI.

One might be inclined tO say that intelligence is the capability to acquire, memorize, and pro- cess knowledge in a way that appropriate accommodation in a changing world is achieved. In any case, the concept of knowledge and its representation clearly are among the fundamental issues in AI. The book begins with this topic discussed in the contribution by Delgrande and Mylopoulos in order to give the reader a feel for the variety of aspects that have to be taken into account for the later issues raised in the book.

The subsequent four articles are dealing with the second focus in this book which is the pro- cessing of knowledge. Obviously, knowledge cannot be processed unless it is adequately represented; on the other hand, an appropriate representation is determined by the way of pro- cessing needed. This is to say that these two issues are intimately related with each other. It is therefore not accidental that some of the aspects raised in PART ONE reoccur in different context here in PART TWO (such as in the contribution by Bibel).

For some processing of knowledge would be synonymous with computation, for others with inference or deduction. We see in the remarkable contribution by Huet that both are just dif- ferent aspects ~f the same phenomenon which is studied in great depth in this contribution. The reader may find it rewarding to overcome the difficulty of studying such a formal and concise text. JV

One of the most successful tools for processing knowledge represented on a logical level of language is resolution. The chapter written by Stickel gives a thorough and knowledgeable introduction into the most important aspects of deduction by some form of resolution. It may be regarded as the basis for the more advanced forms of reasoning discussed in the subsequent two papers, but also for the programming language .

The deductive forms of reasoning captured by resolution in its pure form do by no means exhaust the kinds of knowledge processing known from human experience and studied in Artificial Intelligence. A particularly important one is the kind of reasoning that is involved in inductive inferencing, problem solving (or programming) from examples, and in learning. It is covered in the contribution by Biermann that the reader will like also for its style of presen- tation.

There are many more forms of knowledge processing and inferencing than those discussed in the previous three papers. The tutorial by Bibel covers the more important among those that might play a significant role in common-sense reasoning. It takes, however, the position that the basic deductive tools like resolution or the connection method are essential for these forms of inference as well.

The third part of fhe book focuses on the more advanced programming tools for implementing the kind of systems that are envisaged with the topics discussed before. Logic programming and functional programming are known as the main styles for AI programming. Both lend themselves to a parallel treatment.

In this part of the book, Jorrand takes the approach that the semantic elegance and the mathematical properties of functional programming languages can be preserved within a language where computations or inferences can also be described as networks of cooperating parallel processes. This is shown in the language FP2, where term rewriting forms the basis for the semantics of both functional and parallel programming.

By its nature, PROLOG lends itself to parallel processing. To some extent the programmer might wish to control such parallel processes in PROLOG programs without compromising PROLOG's elegance as a descriptive language. Concurrent PROLOG developed by Shapiro and his group provides such features. His introduction to this language stages the adequate finale of the whole book.

There is .an obvious lack of more good textbooks in Artificial Intelligence. One reason is that in a rapidly progressing field like AI it is nearly impossible for a researcher to actively contri- bute to the field's progress and at the same time be able to overlook large portions of this developing area, not mentioning the time needed for working out an appropriate presentation. A volume like the present one certainly cannot be a substitute for a textbook. But neverthe- less we feel that it may be regarded as a good compromise for the time being. In this sense we hope that the book not only refreshes the memories of those who attended the course, but serves as a unique source of valuable information for graduate classes and indivi- duals with interest in the more fundamental and advanced topics of this exciting area of research.

Mfinchen and Grenoble, April 1986 W. Bibel and Ph. Jorrand CONTENTS

PART ONE: KNOWLEDGE REPRESENTATION

Knowledge Representation: Features of Knowledge J.P. Delgrande and J. Mylopoulos ......

PART TWO: KNOWLEDGE PROCESSING

Deduction and Computation G. Huet ...... 39

An Introduction to Automated Deduction M.E. Stickel ...... 75

Fundamental Mechanisms in Machine Learning and Inductive Inference A.W. Biermann ...... 133

Methods of Automated Reasoning W. Bibel ...... 171

PART THREE: KNOWLEDGE PROGRAMMING

Term Rewriting as a Basis for the Design of a Functional and Parallel Programming Language. A case study: the language FP2 Ph. Jorrand ...... 221

Concurrent PROLOG: A Progress Report E. Shapiro ...... 277 PART ONE

Knowledge Representation Knowledge Representation: Features of Knowledge*

James P. Delgrande** John Mylopoulos*** Department of Computer Science, University of Toronto, Canada

1. Introduction

It is by now a clich~ to claim that knowledge representation is a fundamental research issue in Artificial Intelligence (M) underlying much of the research, and the progress, of the last fifteen years. And yet, it is difficul~ to pinpoint exactly what knowledge representation is, does, or promises to do. A thorough survey of the field by Ron Brachman and Brian Smith [Brachman & Smith 80] points out quite clearly the tremendous range in viewpoints and methodologies of researchers in knowledge representation. This paper is a further attempt to look at the field in order to examine the state of the art and provide some insights into the nature of the research methods and results. The distinctive mark of this overview is its viewpoint: that propositions encoded in knowledge bases have a number of important features, and these features serve, or ought to serve, as a basis for guiding current interest and activity in AI. Accord- ingly, the paper provides an account of some of the issues that arise in studying knowledge, belief, and conjecture, and discusses some of the approaches that have been adopted in formalizing and using some of these features in AI. The account is intended primarily for the computer scientist with little exposure to A1 and Knowledge Representation, and who is interested in understanding some of the issues. As such, the paper concentrates on raising issues and sketching possible approaches to solutions. More technical details can be found in the work referenced throughout the paper.

Naively, and circularly, knowledge representation is concerned with the development of suitable notations for representing knowledge. The reason for its importance in AI is that the current paradigm for building "intelligent" systems assumes that such systems must have access to domain-specific knowledge and must be capable of using it in performing their intended task (hence the term knowledge based sys- tems). This paradigm is in sharp contrast to the approaches used in the sixties, when the emphasis was on general-purpose search techniques, The aim of the earlier approaches, which are termed power-~riented, was to construct general, domain-independent problem-solving systems [Goldstein & Papert 77]. This goal was generally found to be unrealistic for nontrivial tasks since, with undirected search, the number of alternatives that needs to be explored grows exponentially with the size of the problem to be solved. * Repri~ed from Fundamentals in blan-Machine Communication: Speech, Vision and Natural latnguage, Jean-Paul Halon (Ed.), 1986. wi~h permission from Cambridge University Press, ** Current address: Departmen'~of Computing Science, Simon Fraser University, Burnaby, BC.

*** Senior fellow, Canadian Institute for Advanced Research. Current attitudes ~,owards ~intelligen1" system building can be accurately summarized by the slogan "knowledge is power".

According to popular wisdom, a knowledge based system includes a knowledge base which can be thought of as a data structure assumed to represent propositions about a domain of discourse (or world). A knowledge base is constructed in terms of a knowledge representation scheme which, ideally, provides a means for interpreting the data structure with respect to its intended subject matter and for manipulat- ing it in ways which are consistent with its intended meaning. There already exist several surveys of Knowledge Representation which describe the field mostly from the point of view of current practice; these include [Hayes 74]. [McDermott 78], [Barr & Davidson 80] and [Mylopoulos & Levesque 84]. In addition. there have been several fine collections of research papers focusing on Knowledge Representation, such as [McCalla & Cercone 83] and [Brachman and Levesque 85]. The newcomer to the area may also be interested in key papers such as [McCarthy & Hayes 69]. [Minsky 75], [Woods 75], [Hayes 77], [Reiter 78]. [Newell 81], and [Brachman & Levesque 82] which have raised fundamental research issues and have influenced the direction of research.

We already noted the difficulty of characterizing Knowledge Representation as a research area in terms of a coherent set of goals or methodologies. In preparation for the discussion, however, we need to adopt at least working definitions for the terms ~knowledge" and "representation'. By knowledge we will mean justified true belief, following traditional philosophical literature. While there are shortcomings to such a working definition, as [Dretske gl] points out, it is adequate for our purpose. By representation we will understand an encoding into a data structure. Intuitively then, knowle~lge representation means the encoding of justified true beliefs into suitable data structures. This though is a little rigid for our pur- poses. For example we will want to consider also encodings where the information is only thought to be true or maybe even is known to be false or inconsistent. So we will on occasion want to deal with encod- ings where the information may not be knowledge per se.

Preference for one knowledge representation scheme over another depends heavily on the nature of the formal system adopted as a formalization of knowledge. However the preference for one scheme over another depends also on the suitability of the data structures offered, i.e., on how direct the mapping is from the components of the data structures used into their intended interpretations, This paper is con- cerned primarily with the nature of knowledge and its formalizations, rather than its representation. A companion paper [Kramer & Mylopoulos 85] attempts to examine the knowledge representation issues by surveying organizational structures that have been proposed for knowledge bases.

For the purposes of the discussion in the remainder of the paper, a knowledge base KB is a pair where KBo is a collection of statements in the language of some logic L, for example:

KBo = {Student(John), Supervisor(John,Mary)} and ~z is the derivability relation in L, i.e.. specifies what can be derived from the axioms, given the rules of inference of L. Then

EKB iff KBob-zC~

(Adoption of this view implies that knowledge bases are essentially treated here as theories in Mathemati- cal Logic, with KBo playing the role of a set of proper axioms.) Thus the example knowledge base contains not just the statements in KBo but also others that can be derived from them in L So KB may contain statements such as

Student(John) V Professor(Joe)

-,-,Student(John)

depending on our choice of L.

Statements in a knowledge base can be assigned a truth value (usually either true or false) given a world or domain of discourse. The assignment of truth values to statements is carried out in terms of a semantic function. A standard method for doing so, due to Alfred Tarski. treats a knowledge base interpretation as a 3-tuple where D is the set of individuals in the domain of discourse, and R and F are respectively the relations and functions between individuals that hold in the world, Tarskian semantics assumes the availability of semantic functions that map constant symbols in L, such as John, onto individuals in D, predicate symbols in L, such as Student, onto relations in R, and function symbols in L onto functions in F. From these functions, the notion of truth in L can be made explicit. So, for exam- ple, Student(John) is true (roughly) if John (the object in D) satisfies the property of studenthood. A consequence of Student(John) being true in the interpretation is that (in most logics)

Student(John) V Student(q~lary)

will also be true. An interpretation is said to be a model of a knowledge base if and only if all sentences in the knowledge base come out true in the interpretation,

Of course, interpretations are only idealizations of the "real" worlds of students, ships, and bombs with respect to which we interpret a knowledge base. Nevertheless, a formal semantics, Tarskian or other. can be extremely valuable as long as the structure of the interpretation captures our intuitions about the world or domain in question.

The remainder of the paper consists of two parts. The first discusses the basic nature of knowledge, belief, and hypothesis, and introduces a number of important concepts and methods for their study. The second part points out a number of features that information in knowledge bases has, such as incomplete- ness, inconsistency, inaccuracy, and uncertainty, and provides a brief overview of methods that have been used in attempting to deal with these features within a representational framework.

2. On the Nature of Knowledge

The main concern of this section is the relationship between the information contained in a knowledge base, and the state of the world or domain of discourse which the knowledge base is intended to describe. First we discuss the commitment that is made with respect to the truth of a statement. While we have restricted ourselves so far to knowledge per so. many systems treat weaker notions such as belief or hypothesis. This commitment may be called the epistemic status of a statement, Second we consider the as.c~rtional status of a statement, i.e., the confidence in the assertion represented by a statement. For example, a statement may be regarded as holding absolutely and without exception, or alternatively as only being usually true. Lastly we review semantic theories that have been proposed for assigning mean- ing to encodings of knowledge. 2.1, Knowledge, Belief, and Hypothesis

Knowledge was defined as "true, justified, belief ~. In this section we develop this notion further by exploring the terms "true", "justified", and "belief". The notion of truth can be discharged by a standard Tarskian account. However we are still left with the terms "justified" and "belief". Let us look at "belief" first.

Belief can be defined in a surprisingly simple way. Given a knowledge-based system (or agent) A, A believes a sentence P, just when P appears in A's knowledge base (or "language of thought~), Belief then, so construed, consists of literally anything that can be represented. So does this make the term vacuous7 Not quite: belief may be taken as distinguishing genuine cognitive systems from simple processors of informa- tion, such as television sets [Dretske 81]. However since belief is what is attributed to cognitive systems, it is clear that a general unconstrained "believing" system is unacceptable: one would also want to ensure that beliefs are coherent, consistent, and (in a nutshell) "reasonable". So, for example, given that

Student(John)

( x )[Stud ent( x ) D Hard Worker( x ) ] are believed to be true, one may want to require that

HardWorker(John) also be believed to be true Similarly it seems reasonable to stipulate that it not be the case that

~Student( ]ohn) be believed. Typically then one would want beliefs, although possibly counterfactual, to have properties similar to knowledge.

Logical systems of knowledge and belief typically deal with only one of knowledge or belief. For such systems standard first-order logic is usually augmented with a sentential operator K, where Kc~ may be read as "c~ is known (believed) to be true ". Whether the informal interpretation of K actually corresponds to knowledge, or instead to belief, though usually depends only on whether the axiom

KaD~ is present. This axiom has the informal reading "if ~ is believed to be true, then ~ is true", tf the axiom is present, then whatever is in the knowledge ba~ is in fact true. and the notion corresponds to knowledge; otherwise it corresponds to belief, Any other axioms of the system apply both to knowledge and belief, The fact that a system deals with knowledge (say) rather than belief then has very little effect on the characteristics of the system. Given this, the work of Moore on reasoning about knowledge and action ([Moore 80]) and of Fagin and his co-workers on multi-agent reasoning ([Fagin et al 84]) deal with knowledge, while that of Levesque on incomplete knowledge bases ([Levesque 81]) deals with belief. Konolige, in his dissertation research [Konotige 84]. examines both notions from the point of view of a set of agents. Clearly many systems of default reasoning are not knowledge-preserving and thus deal with belief. [Halpern & Moses 85] provides a general introduction to logics of knowledge and belief, while [Hughes & Cresswell 68] is an excellent introduction to Modal Logic. However matters do not end here For example, if a sentence is belief only, then it is possible that the sentence may later be discovered to not in fact hold. In this case. other beliefs based on the erroneous belief would have to be re-examined and perhaps modified or retracted. This leads to the question of which beliefs should be introduced or held, or, more broadly, how one may justify a belief. Let us cah a justified belief that is not known to be true a hypothesis. [Quine & Ullian 78] is a good introduction to issues surrounding this notion, while [Schefller 81] provides a more thorough exposition. Under this view, it is the established truth of a sentence that separates knowledge per se from hypothesis and belief. The justification of a sentence, on the other hand, separates knowledge and hypothesis from belief. In this latter case, a known sentence may be regarded as being absolutely justified.

It would be going too far afield to survey justification in any depth. It is instructive though to con- sider forms of reasoning that can be used to introduce justified belief into a knowledge base. For purposes of illustration we will make use of the following classical form for deductive reasoning.

(x)[e(x) ~ Q(x)] (1)

e(a) (2)

Q(a) (3)

The inference from (1) and (2) to (3) is of course absolutely justified. However, the schema can also be used as a template for introducing justified belief. Some default logics for example may be regarded as automating a weaker form of the above deduction. Thus if one knows that most elephants are grey, and that Clyde is an elephant, then lacking information to the contrary, one may feel justified in concluding that Clyde is grey. (These considerations are discussed further in the section on nonmonotonicity ) Strictly speaking, such an inference would introduce a hypothesis that Clyde is grey. Justification would depend on pragmatic factors, such as the number of elephants seen, knowledge of albinoism, etc.

However, the schema may be employed in quite different ways for introducing hypotheses. Consider first situations where we have instances of (1) and (3). We can then claim that rule (1). together with conclusion (3) suggests a cause, namely (2). Thus for example if we have that "All people with colds have runny noses" and "John has a runny nose", we can propose the hypothesis "John has a cold". If we knew further that people with colds had elevated temperature, and that John had this symptom, then our faith in the hypothesis that John had a cold would be strengthened. This type of reasoning is known as abduc- tire inference [Pople 73]. Abduction provides a mechanism for reasoning from effect to possibIe cause. It provides a model of reasoning that has been found useful in the development of medical diagnosis systems (in particular) and expert systems (in general). The inferencing components of many production rule sys- tems, as perhaps best exemplified by MYCIN [Shortliffe 76], can be viewed as implementing particular forms of abductive reasoning. Abduction can also be associated with default reasoning. Thus if we knew that people with colds typically had an elevated temperature, then, again, if John had this symptom, we could propose that John had a cold. The question of how to determine one's faith in such a diagnosis in a non ad hoe fashion is, of course, very ditficuIt.

Returning to our schema for reasoning, consider the third alternative where we have instances of (2) and (3) -- for example, a large collection of ravens, all of which happen to be black. In this case we might hypothesise the general statement "All ravens are black". This is known as inductive reasoning. An inductive mechanism provides a means whereby general conjectures may be formed from simple facts or ground atomic formulas. However the general problems of justifying induction and explicating the notion of confirmation are known from philosophy to be extremely difficult [Goodman 79], [Scheltter 81]. In AI, inductive inference programs typically assume that the domain of application is governed by some under- lying grammar. [Angulin & Smith 82] provides a thorough survey of efforts in this area, while [Shapiro 81] presents a particularly elegant treatment of some of the problems.

This breakdown into knowledge, hypothesis, and belief gives us a means of characterising the epistemic status of a statement. If a statement is considered to be knowledge, then presumably one would be unwilling to allow that it can be anything but true. Thus any mathematical or definitional statement would be treated as knowledge. There are certainly other sentences though that one would wish to treat as knowledge. For example whales, which were once regarded as fish, now are recognised as being mare- mats. However, while this demonstrates that "all whales are mammals" isn't knowledge per s¢, it would be a rare knowledge base that didn't treat it as such. Knowledge then, pragmatically viewed, consists of those sentences that are taken for granted, i.e. that one is unwilling to give up. This suggests that a particular set of formulas may or may not be taken as knowledge, depending on one's viewpoint. For example, much of current astronomy is conducted under the assumption that the theory of relativity is true. Yet relativity certainly isn't knowledge as such (since the theory, like most of its predecessors, could be incorrect) and so, at a lower level, this theory itself is subject to experimentation and confirmation.

2.2. Assertional Status

The previous section dealt with the epistemic status of a statement -- that is, the presumed truth of a sentence. In this section we turn to the assertional status of a (general) sentence, that is, the strength of the claim being made by a sentence. This notion is best introduced by means of an example. Consider the statement

"Elephants are grey".

There are at least three readings:

(1) "All elephants are grey". While this seems intuitively reasonable, strictly speaking it is false, since there are, among other things, albino elephants.

(2) "Typically elephants are grey". This has the related reading "an elephant is grey with confidence or probability p",

(3) ~lephants are grey. However we acknowledge possible exceptional individuals". In this case the intention is that greyness is in some sense associated with elephanthood (although it is not clear exactly how).

These possibilities lead to three different approaches to specifying the meaning of a term. Consider the first case. The claim that a term may be exactly identified with a collection of properties has been called the traditional theory of meaning [Schwartz 77]. Under this theory, "all elephants have four legs" (if true) would be analytic (i.e. would be true purely by virtue of meaning and independently of colla- teral information), and the meaning of "elephant" could be laid out by specifying enough of these properties Squares being equilateral rectangles and bachelors being unmarried males are, in most accounts, examples of analytic truths,

However, it is certainly not the case that all elephants have four legs, nor is it the case that every elephant is grey or has a trunk. In fact it seems that elephants may have no commonplace exceptionless properties, and that, barring assertions such as "all elephants are mammals" and logical truths, any general statement concerning elephants may have exceptions. Clearly a similar argument can be applied to other common nouns, such as "lemon", "gold", "water", and so on, Thus for example a lemon need not be yellow. nor bitter, nor necessarily oblong. Such terms are examples of natural kind terms. These terms may be characterised as being of explanatory importance, but whose normal distinguishing characteristies are explained by deep-lying mechanisms [Putman 75]. Hilary Putnam, in the reference just cited, argues per- suasively that natural kind terms have no knowable defining conditions, and no non-trivial exceptionless properties. Thus he takes the position that a statement such as "elephants are mammals" may be falsified; this seems not unreasonable if one considers that "whales are fish" was once thought to be true.

There are of course terms that may be precisely defined or specified. For example, a square is defined to be an equilateral rectangle -- a three-sided square makes no sense whatsoever. Also, if we define "uncle" to mean a brother-in-law of a parent, then someone who fulfills the latter conditions cannot fail to be an uncle, and an uncle cannot but be a brother-in-law of a parent. These definitions clearly do not allow for exceptions. The notions of definitional terms and terminology moreover are key in the design of many knowledge representation systems, and in particular ~mantic network formalisms such as KL-ONE [Brachman 79]. So the notions of analyticity and the traditional theory of meaning, while inapplicable to natural kind terms, are nonetheless necessary for terminology and definition.

In the second reading of "elephants are grey", where we have, "typically elephants are grey", a term is identified with a description of a typical member. This is the essence of prototype theory [Rosch 78]. In AI, prototype theory provides the foundation for many frame-based reasoning systems. Frame-based rea- soning systems are commonly used for recognizing stereotypical situations. For such applications, where an individual, situation, etc. is identified on the basis of a description, prototype theory seems perfectly adequate. Thus to recognise an elephant, we might look for a trunk, grey colouring, four legs, and so on. If any of these features are missing, it doesn't mean that the object isn't an elephant, although it may make us tess certain that it in fact is. For general reasoning systems however prototype theory has drawbacks. Foremost is the fact that for reasoning with prototypes, one is forced to use a probabilistic or default theory of reasoning. In contrast, standard first-order logic can be employed for reasoning with analytic (definitional) statements. Also, as is pointed out in [Israel & Brachman 81], one cannot form complex con- cepts strictly within prototype theory. A prototype system has to be told for example that (the concept) "four-legged elephant" subsumes both "four-legged" and "elephant". In summary then prototype theory appears too weak to be used as a medium for the general representation of knowledge. However, it has been found useful in representing descriptions of natural kind terms.

The third case attempts to maintain a general sentence such as "elephants have four legs" while admit- ting exceptions to the sentence at the same time, This approach lacks a precise and complete formalisation; however it can be motivated by means of a naive view of scientific theory formation. Consider the scientific hypothesis that water boils at 273°K. In testing this hypothesis by examining a particular sampte of water, one verifies not just the statement in question, but also a host of underlying assumptions 10

[Putnam 79]. [Quine & Ullian 78]. Thus a test of the statement "water boils at 273"K" presumes that the water is pure, that atmospheric pressure is 760ram, that the thermometer is accurate, that the act of meas- urement does not affect the boiling point, etc. The failure of a sample to boil at 273"K then does not neces- sarily falsify the hypothesis, but rather falsifies the conjunction of the hypothesis and the underlying assumptions. The original conjecture can be maintained by claiming that some assumption has been falsified, even though the particular assumption may not be specified, nor even known. Similar remarks apply to four-leggedness and elephanthood: a three-legged instance may be discharged by appealing to some (possibly unknown) underlying assumption. This is not to say though that defining conditions for natural kind terms may not be hypothesised. For example we may entertain the hypothesis that water is H20. In this case no exceptions are permitted; the radical OH for example is simply not water. On the other hand, the hypothesis that water is H20 may be used to account for (at least in principle) the notion of boiling point, and to account for any exceptions.

2.3. Semantic Theories of Knowledge

A knowledge representation scheme is usually intended as a vehicle for conveying meanings about some domain of discourse. To be at all useful there must be an external account of the way particular configurations expressed in terms of the scheme correspond to particular arrangements in the domain. That Is, there must be an associated semantic theory. Simply put, this means that knowledge bases must be about something, and a formal account of this aboutness constitutes a semantic theory.

As indicated in the introduction, the standard starting point for semantic theories is Tarskian seman- tics, which is relatively straightforward, well understood, and well accepted by now. The question arises though as to how this treatment can be extended to deal with knowledge (or belief -- since the points made in this section apply equally to both notions, we will use them interchangeably). The main difficulty in providing a semantic theory for knowledge is that the truth value of statements may or may not be known, independently of their actual truth value in the domain of discourse. Thus for example one may not know whether it is raining in North Bay at present, although it certainly either is or is not raining there now. Moreover if we allow that knowledge can be explicitly referred to. there arise questions con- cerning the extent to which one can have knowledge about one's own knowledge or knowledge about one's ignorance. If we introduce a new monadic operator K for "knows", then these questions concern the status of sentences such as KKc~ or K-.Ka. In our review, we consider three semantic theories that have been pro- posed for knowledge and belief. Following [Levesque 84]. we refer to them as the possible worlds, syn- tactic, and situational approaches.

For possible worlds semantics, [Hintikka 62] is the seminal work. Within AI, [Moore 80] and [Levesque 84] present formalisations of knowledge or belief based on a possible-worlds semantics. To illustrate this approach, consider the following knowledge base:

Teacher(John)

Teacher(Bill) V Teacher(Mary)

(x)[Teacher(x) D SchoolErnployee(x)].

This knowledge base may be regarded as specifying what is known about the world. Thus it constrains the 11 way the wortd is thought to be; for example, under the intended interpretation, John is a teacher and at least one of Bill and Mary are teachers. However it also underconstrains the world. If Lou is an indivi- dual, then according to what's known, she may or may not be a teacher. That is, the actual world may be such that Lou teaches, or it may be such that she does not. We can say then that there are, according to the knowledge base, possible worlds in which Lou does teach, and others in which she does not. On the other hand, there are no possible worlds compatible with the knowledge base in which John doesn't teach; and in each possible world compatible with the knowledge base at least one of Bill or Mary teaches.

Now each such possible world may be characterised using a Tarskian framework. Thus if Teach(Lou) is true in a possible world, so are -,-,Teach(Lou) and Teach(ixrg)V -Teach(John). So a knowledge base can be characterised semantically as a set of possible worlds. A system knows a sentence tx just when a is true in all worlds that are possible according to the system's knowledge base. Thus, from our previous exam- ple. the system knows not just that John is a teacher, but also that John is a school employee. Depending on how the notion of "possible" is defined, one can stipulate, for example, that if something is known, then it is known to be known, and that if something is not known, then it is known to be not known -- that is, whether Ka implies KKa, or -,Kt~ implies K~Ka respectively. A drawback to approaches of this type for modelling knowledge is that they imply logical omniscience; that is, all logical consequences of beliefs must also be believed Thus all valid sentences must be believed. This entails for example that such a sys- tem knows the outcome of an optimal strategy in chess or the truth of Fermat's last theorem. Further- more, if a sentence and its negation are believed, then so must be every sentence. Neither restriction seems particularly realistic. The first is computationally unreasonable and, for the second, most people would happily admit to the possibility of harbouring inconsistent beliefs, without thereby believing everything. There are however formulations of possible worlds that do not necessarily lead to logical omniscience, not- ably those presented in [Lewis 73] and [Montague 74]. [Hadley 85] presents a critique of the aforemen- tioned approaches to knowledge and sugests a solution to these difficulties based on the Lewis/Montague approach.

An alternative to the possible worlds approach, which may be called the syntactic approach, is to have the model structure contain~ or be isomorphic to, an explicit set of sentences. [Moore & Hendrix 79] and [Konolige 84] are both advocates of this approach. Given our example knowledge base then, all that would be known would be the three original sentences. It would not necessarily be known that SchoolEmployee(John), since this sentence doesn't appear explicitly in the knowledge base. However this 'is not unreasonable: one cannot in general know all consequences of one's betiefs -- this after all is the prob- lem with logicat omniscience. This alternative also has some intuitive support. Certainly when people acquire beliefs, they seem to usually do so without markedly altering their prior set of beliefs. Conceiv- ably belief acquisition, in most instances, consists of little more than adding a belief to an existent set. The approach also avoids the problem of logical omniscience, since everything believed is explicitly represented.

However this approach seems to make too "fine grain" a distinction with respect to the form of a belief. The sentence

Teacher(Bill) V Teacher(Mary) is in the belief set, and so is believed. However

Teacher(Mary) V Teacher(Bill) 12 is not in the belief set, and so is not believed Yet this is counterintuitive: a disjunction, aVS, may be informally read as "a or 8 (or both) are true ~ -- a reading that is independent of any ordering on a and 8. So it would seem that whenever a V 8 is believed then 8 V¢x should be also. In general then any knowledge representation scheme using the syntactic approach must also (presumably) specify what beliefs follow from a given set.

A third possibility, presented in [Levesque 84] generalises the notion of a possible world to that of a situation, The general idea is that while a possible world fixes the truth value for all sentences, a situa- tion may support the truth of some sentences, the falsity of others, and neither the truth nor falsity of yet other sentences. Phrased slightly differently, a knowledge base is relevant to (the truth value of) some sentences and is irrelevant to others. So our example knowledge base supports the truth of John being a teacher and at least one of Bill or Mary being a teacher. On the other hand it supports neither the truth nor falsity of Lou being a teacher.

The definition of a "support" relation specifies what beliefs are held, given that others are held. Roughly speaking, the definition extends the standard possible worlds model structure by replacing the notion of a possible world, where the truth value of all sentences is specified, by the notion of a situation, where the truth value of a sentence may or may not be specified. The definition of the support relation also ensures that desired relations among sentences hold. Thus a situation supports the truth of ¢~V8 if and only if it supports the truth of either c~ or 8. and a situation supports the falsity of aV8 if and only if it supports the falsity of both ~ and B. Thus if

Teacher(Bill) V Teacher(Mary) is believed, then so is

Teacher(Mary) V Teacher(Bitl).

In fact, in some sense these statements may be regarded as being the same belief, Unlike the possible worlds approach though, logical omniscience is avoided. In particular, a valid sentence need not be (expli- citly) believed, beliefs need not be closed under implication, and beliefs can be inconsistent without every sentence being believed. Thus given our example knowledge base it may or may not be the case that either of

SchoolErnployee(John )

Teacher( I_xru) V ~ Teacher( Lou ) is believed. Finally, unlike the syntactic approach, the semantics of belief are ultimately based on the (Tarskian) conception of truth, rather than on restrictions to a set of sentences.

The situational approach also permits a distinction between what may be caIled explicit belief and implicit belief. The former deals with what an agent actually holds to be the case, while the latter deals with the way that the world would be, assuming that the agent's beliefs are in fact true. In this view then, implicit belief is the "limiting" case of explicit belief. This fits in well with the semantic view, where a possible worlds semantics may be regarded as the "limiting" case of a situational semantics, wherein either the truth or falsity of all (rather than some) sentences is supported. 13

There is a second distinction, separate from semantic theories of knowledge, that may be profitably discussed at this point, This distinction concerns how knowledge is to be formulated in a knowledge representation scheme. There are at present two major approaches. The first extends a logic, typically classical propositional or predicate logic, by adding the sentential operator K mentioned previously, where a sentence Ka may be read as "c~ is known to be true". Thus the following statements

"John is a student and Mary is known to be a student. '~ "It is known that the only students are the known students," "There is a student apart from the known students."

might be respectively represented

Student(John) A KStudent(Mary)

K (x)[Student(x) ~ KStudent(x)]

(~x)[Student(x) A ~KStudent(x)]

The operator B, for "~lelieves", is sometimes used instead of K; [Konolige 84] uses [Si]a to mean "agent Si believes c~". [,Levesque 81], [Levesque 84], [Konolige 84], and [Fagin et al 84] are all examples of approaches that extend first-order logic.

The second approach is to formulate a theory of knowledge within first-order logic; [McCarthy 79] and [Moore 80] are both examples of this approach. The idea is that one introduces a predicate "Know", and then provides axioms to govern this predicate. Thus Moore represents the import of "Know" by reduc- ing it to the notions of truth in possible worlds, and of worlds possible according to what is known. His "fundamental axiom of knowledge" is

T(w I, Know(a, p)) -----(we)[K(a, w I . w 2) D T(w 2. p)]

which can be read as "a person a knows the facts p that are true in every world w e that is possible accord- ing to what he knows", Further axioms of course are required to pin down the predicates K and T: these axioms amount to encoding expressions in the object language (which talks about known facts) into expres- sions of first-order logic that talk about possible worlds.

So, is there any reason to favour one approach over the other? or, more to the point, is there anything that one buys you that the other does not? First of all, the second approach has the advantage that it embeds the characteristics of knowledge within a well-understood formal framework. This also means that one can take an existing, off-the-shelf theorem prover (say) for deriving sentences from a knowledge base phrased in such terms, With the first approach, inference procedures implementing the system must be developed.

However this advantage isn't conclusive. With the second approach we have, after all encoded a language within the meta-Ianguage, first-order logic, and need to express explicitly how one may reason with knowledge. Thus for example if we have that if someone knows that p&/ is true then it doesn't automatically follow that one knows that p is true. Thus, one way or another, one must state that some- thing like

Know(a, "pAq") D Know(a. "p") 14

holds. So it is not clear thai an automatic computational advantage obtains.

A potential disadvantage to the second approach is that it posits entities that may not be directly use- ful or applicable to the task of representing knowledge and moreover may lead to problems of their own. Thus. taking Moore's work as an example again, possible worlds are recognised as real entities in the language (in the sense that they appear in the range of quantifiable variables), The first approach doesn't make this explicit commitment. However once we allow possible worlds into our language, one is forced to deal with these entities. Questions arise as to how one possible world differs from another, how individu- als may differ across worlds, and how, given an individual in one world, it can be identified in another. For these reasons the first approach, where an existing logic is extended, is generally favoured for reasoning with knowledge.

3. The "Ins", "Uns", and "Nons" of Knowledge

To understand a phenomenon, such as cars, hearts, or knowledge, one needs to study more than just its textbook definition. In particular, one needs to examine dimensions in terms of which the phenomenon can be characterized, and to study the allowable variations of the phenomena along each dimension. This section surveys some such dimensions for encodings of knowledge and describes relevant research issues and results

3.1. Incompleteness

When a query is evaluated with respect to a database to find out for example if John Smith is a stu- dent, it is customary to assume that the database contains complete information about students. Thus failure to find information in the database is interpreted as negative information. In this case if John Smith's status was not found in the database, it would be concluded not that his status was unknown, but that he was not a student. This hidden assumption was pointed out and examined in [Reiter 78] and has been labelled the closed world assumption. In general, however, this assumption is not justified and can- not be used. For anything but idealized microworlds, a knowledge base wilt have to be an incomplete account of the domain of discourse. Given this state of affairs, we want to be able to first, express our lack of information, and second, ask questions about it.

Before discussing some proposals for dealing with incompleteness, it is instructive to examine some of its sources. The most obvious source is iack of information about the domain of discourse. Thus. an incom- plete knowledge base may only know two students when in fact there are many more. Moreover, it may be the case that the knowledge base knows that there are other, unknown students. A second important source of incompleteness has to do with the derivability relation ~-L which defines what can be derived from given facts in the knowledge base. In particular, this relation may be "weak" in the sense that there are statements whose truth would seem to be implicit in the given facts (the set KBo discussed in the intro- duction), and yet are not in the knowledge base because they are not derivable through ~-z- For example, a- knowledge base may contain

Student(John) (4)

Student(Mary) (5) and use the empty derivability relation (i.e., there are no inference rules). Such a knowledge base does not 15

contain

Student(John) V Student(Joe)

even though this is clearly true in every possible world described by the knowledge base.

It may seem to the reader that this is a pathological example and that. in fact, "reasonable~ knowIedge bases will always have a sufficiently strong derivability relation to eliminate such examples. It turns out, however, that there are several reasons why a derivability relation may be weak either by necessity or design [Lakemeyer 84]. Firstly, weak derivability relations may make much smaller demands on compu- tational resources, and thus may be desirable from a computational point of view (see [Brachman & Levesque 84] for a discussion of such issues). In addition, GSdeYs incompleteness theorem establishes that there are inherent limits to the completeness of a knowledge base when the knowledge representation scheme is sufficiently powerful.

Expressing incompleteness involves a number of capabilities, including saying that something has a property without identifying the thing with that property, saying that everything in a class has a property without saying what is in the class, and altowing the possibility that two nonidentical expressions name the same object [Moore 80]. First-order logic provides facilities for handling these situations through the use of logical connectives, quantifiers, and terms. Thus, easily and trivially, we can state:

(3 x)[reach(x) A PlaeeOfResidence(x, Paris)]

(x)[Teacher(x) D Erudite(x)]

MorningStar = EveningStar.

However difficulties arise in a first-order logic setting when one attempts to deal with the closed world assumption or its converse, the open world assumption. Suppose for example that we want to state that there is an unknown student in a knowledge base which includes statements (4) and (5). Thus:

(3 x)[Student(x) A ",(x=John) A -,(x=Mary)] (6)

One drawback of this formulation is that the length of such formulas could be proportional to the size of the knowledge base. A more important drawback of (6) comes into the picture if we try to use it as a query, asking of the knowledge base whether there exists an unknown student, To express such a query the user will have to know a/l the known students. An alternative, explored in [Levesque 84], is to use the modal operator K where Ks means ~ is known. Then, stating that there is an unknown student can be expressed by

(3x)[Student(x) ^ ~KStudent(x)] and a similar formulation can be used to ask if the knowledge base knows all students. Note that this statement, unlike (4), (5), or (6) is a statement about the knowledge of the knowledge base (or lack of it) rather than about the domain of discourse (students),

A complementary approach to Levesque's is proposed in [Moore 80] which focuses on a knowledge base's knowledge about other agents, rather than on self-knowledge. To say that John knows a French- speaking teacher might be expressed as 16

Know(John. "( 3 x )[Teacher( x ) h Fr enchS peaking( x ) ]" ) whereas the statement that there is a French-speaking individual that John knows is a teacher might be represented as

( 3 x )[ Fr enchS peaking( x ) A Know(John, "Teacher(x)")]

As discussed earlier, Moore's work is also distinguished by the fact that it formulates its theory within first-order logic. In both Levesque's and Moore's approaches, possible world semantics serve as the basis for a semantic theory.

An alternative approach to those described so far is presented in [Konolige 84] where each agent in a multi-agent environment is assumed to have its own set of facts and its own (possibly weak) derivability relation. Thus, stating that John knows that Sue is a teacher is expressed as

[John] Teacher(Sue) and the facts derivable from this are determined by the derivability relation associated with agent John. A similar proposal is outlined in [Bibel 83].

Yet another treatment to incompleteness is described in [Belnap 75] which proposes a four-valued logic where the extra two values can be read as "unknown" and "inconsistent". This approach has been used by [Shapiro & Bechtel 76] in the development of a semantics for a semantic network formalism and by [Vassiliou 80] in accounting for "null values" in databases.

3.2. Nonmonotonity

If we view a knowledge base as a first-order theory, additional facts invariably lead to additional knowledge. For instance, if we have a knowledge base which is given (4) and (5), and add

Student(Jane) we now know -- in addition to everything that logically follows from (4) and (5) -- formulas such as

Student(John) h Student(Jane)

Student(Jane) V Married(Bill) that were not known previously. More formally, if KB and KB' are knowledge bases and

KB = < KBo, ~-z >

KB" = < KB0 U ~, ~-L > then

KB -- KB'.

This property makes first-order and most other "conventional logics" monotonic. Unfortunately, mono- tonicity is not a property of commonsense knowledge and reasoning with respect to such knowledge [Min- sky 75]. Indeed, there are many situations where monotonicity leads to problems. Here are some, noted in [Reiter 78]: 17

Default assignments. Default rules are used to assign values to properties in the absence of specific infor- mation. Two examples are:

"Unless you know otherwise, assume that a person's city of residence is Toronto." "Unless you know otherwise, assume that an elephant is grey,"

Knowledge Incomlaleteness. The closed world assumption discussed in the previous section can be expressed with statements of the form

"Unless you know otherwise, assume that an object is not a student."

which amounts to saying that all students are assumed to be known.

Default Inheritance, Consider a prototypical description of birds which states that birds fly. Of course, this can be false, either for particular birds (Tweety) or classes of birds (penguins). It can then be under- stood as a rule of the form

KUnless you know otherwise, assume that a bird flies."

This is a classical example of "default inheritance" used in semantic networks (e.g., [Fahlman 79]) where a "flies" attribute is associated with the concept of bird, and is then inherited by instances or specializations of the concept if there isn'~ information to the contrary.

So far we have seen the need to introduce assumptions into the knowledge base while reasoning in order to deal with ignorance (incompleteness) or with knowledge that only provides an approximate account of the world (e.g., prototypical descriptions). Nonmonotonic reasoning is brought about by the introduction of such assumptions. If at some time an assumption is introduced in the knowledge base, say

~Student(Sue )

because of lack of information, and it is later discovered that Sue is in fact a student, we must remove the assumption concerning Sue's student status, or face the prospect of an inconsistent knowledge base. Thus in this situation, the addition of facts to the knowledge base leads to some (former) conclusions no longer being derivable. This is the feature that renders reasoning systems that use "unless otherwise" rules non- monotonic.

Versions of nonmonotonic reasoning were used in semantic network and procedural representation languages such as PLANNER [Hewitt 71] before any attempts were made within AI to formalize and study it. [Reiter 78] and subsequently [Reiter 80] offered a formalization based on default logics. These are log- ics which include first-order logic, but in addition can have domain-specific inference rules of the form

or(x1 ..... x.): Mt3(xl ..... x.) y(xl ..... x~)

These rules can be read informally as "If for particular values of xl,...,xn, tx is true and g can be assumed consistently, then assume T, For example. Person(x): MLives(x,Toronto) Lives(x,Toronto) states that if someone is a person and it can be consistently assumed that he lives in Toronto (i.e, it cannot 18 be derived from the knowledge base that this someone doesn?t live in Toronto) then assume that he lives in Toronto. Likewise, the closed world assumption for students can be approximated by the inference rule

Person(x): M-Student(x) -,Student(x)

With this machinery, a default theory consists of a set of axioms and a set of default inference rules. Its theorems include those that logically follow from the axioms using not only first-order logic inference rules, but also "assumptions" generated by the default inference rules. It proves to be the case that a default theory can have more than one possible set of theorems, depending on the order of application of its default inference rules [Reiter 80], Each of these sets can be viewed as an acceptable set of beliefs that one can entertain with respect to a default theory.

Another approach to default reasoning, first proposed in [McCarthy 80], is the notion of circumscrip- tion. Intuitively, circumscription can be thought of as a rule of conjecture that allows one to jump to cer- tain conclusions in the absence of knowledge. This is achieved by stating that all objects that can be shown to have property P, given some set of facts, are in fact the only objects that satisfy P. Consider, for exam- ple. a blocks world situation with two (known) blocks A and B:

Block(A )

Block(B).

One way of circumscribing with respect to the predicate Block amounts to saying that the known blocks are the only blocks To achieve this we pick the following as the circumscription of Block, written C(Block):

C(Block ) ~ Block(A)ABlock(B) and substitute this in the formula schema

c( ~ ) ^ (x)[ ~(x)~ e(x)] ~ (x)[e(x)~(x)]

(from [McCarthy 80]). This schema can be regarded as stating that the only objects that satisfy P are those that have to, assuming the sentence C. C(q~) is the result of replacing all occurrences of P (here Block) in C by some predicate expression @. Thus it states that q~ satisfies the conditions satisfied by P. The second conjunct, (x)[~(x)DP(x)] states that entities satisfying • also satisfy P, while the conclusion states that • and P are then equivalent. In our example, if we then pick

• (x) =-- (x=A v x=B) and we substitute back in the circumscriptive schema and simplify, we end up with the circumscriptive inference

(Block(A) A Block(B)) ~-c (x)[Btock(x) D (x=A V x=B)].

Note that not every possible choice of a circumscription for Block leads to reasonable conclusions (see, for example, discussion in [Papalaskaris & Bundy 84]). [Etherington et al 85] study the power and limitations of circumscription, while [McCarthy 84] provides a more recent account of circumscription, including a more general formulation of this important method of nonmonotonic reasoning. t9

Yet another approach to nonmonotonic reasoning is described in [McDermott and Doyle $0] and sub- sequently in [McDermott 82] and [Moore 83]. In the first paper, the authors present a logic consisting of standard first-order logic augmented with a sentential operator M whose informal interpretation is "is con- sistent". Thus the statement that a bird is assumed to fly, given that such an assumption is consistent, is expressed by the axiom:

(x)[Bird(x) A MFties(x) D Flies(x)].

If we know that Tweety is a bird then, barring information to the contrary, we would conclude that Tweety flies. The system is nonmonotonic since additional axioms may block previous inferences. For example, if the axioms

(x)[Penguin(x) D ~Flies(x)]

Penguin(Tweety)

were added, we would conclude that -~Flies(Tweety). This conclusion blocks the original inference since MFlies(Tweety) is no longer true.

A difficulty with this particular approach is that the notion of consistency provided is quite weak. Specifically, there is no relation between the truth values of a sentence p and Nip. This means that the pair IMp. -,p} may not necessarily be inconsistent~ [McDermott 82] examines stronger versions of this logic arrived at by adding axioms to the original. [Moore 83] developes this line of work further by pointing out that there are least two types of default reasoning that work such as McDermott and Doyle's addresses, The first, which he calls default reasoning deals with facts concerning the external world. As an example, if most Quebecois are French-speaking and Pierre is from Quebec, then if there is no informa- tion to the contrary, one may reasonably conclude that Pierre speaks French. The second, called autoep- isternic reasoning, concerns reasoning about one's own beliefs. An example is the sentence "I know that I don't know whether it's raining in Paris at present ~. This is nonmonotonic since if I was told that the sun is presently shining in Paris, I wouId withdraw the previous sentence. Moore addresses this latter type of reasoning by means of a propositional logic of belief, much like those presented in the previous section. It is a point of interest then that this particular line of research, beginning with [McDermott and Doyle 80], lead from strictly nonmonotonic concerns, to logics of knowledge and belief as are used to address prob- lems of incompleteness.

A second point of interest that should be evident from this brief survey is that there is no clear agree- ment as to what constitutes the preferable approach (if indeed there is one) to the problems of nonmono- tonicity. Solutions proposed range from the (circumscriptive) addition of formulas, to extending first- order logic with sentential operators, to adding default rules of inference to first-order logic. The current diversity, interest, and activity of the area is also borne out by a recent workshop in this area [AAAI 84].

3.3. Inconsistency

In principle, one of the advantages of "conventional" logics (e.g., standard first-order predicate cal- culus) is the availability of a notion of consistency which determines, for example, that the knowledge base 20

Canadian(John) V C~ad ian( Mary )

-~Canadian(]ohn )

~Canadian(Mary) is inconsistent, i.e., there is no interpretation that is a model for this knowledge base. Unfortunately, how- ever, it is a fact of life that large knowledge bases are inherently inconsistent, in the same way large pro- grams are inherently buggy. Moreover, within a conventional logic, the inconsistency of a knowledge base has the catastrophic consequence that everything is derivable from the knowledge base.

From the point of view of knowledge representation, dealing with inconsistency involves two issues. The first concerns the assimilation of inconsistent information, i.e., the ability to include in a knowledge base inconsistent information without rendering the knowledge base useless. The second issue concerns the accommodation of inconsistent knowledge, i.e., the modification of the knowledge base to restore con- sistency. It must be stressed that both issues are important and should be seen as opposite sides of the same coin. Indeed, a knowledge base should be able to behave like a body of scientific knowledge consisting of observations and general laws. Inconsistencies can exist at any time, but there are aIso mechanisms for rationalising inconsistencies and for introducing new general laws that account for observational knowledge and at the same time eliminate or reduce the inconsistencies.

A common solution to assimilation employed by many early semantic network formalisms was to constrain the order m which inferences are tried. Thus, given the knowledge base:

Is a(Pengu~n. Bgrd) Attribute(Bird, Flies)

Attribute(Penguin, ~Flies )

Instance of(Opus. Penguin) with the obvious informal interpretation, if we wanted to determine whether Opus flies, then attributes associated with penguinhood would be tried before those of the (superclass) birdhood. Thus it would be concluded that Opus doesn't fly. The (inconsistent) assertion that Opus flies, which could potentially be derived from the fact that Opus is also a bird, simply isn't inferred. The difficulty with such an approach of course is that its semantics isn't at all clear. Given such a scheme, it is by no means obvious just what can or cannot be inferred. The problems of recasting the issue of inheritance of such default properties however have been recently addressed in [Etherington and Reiter 83], using Reiter's default logic, and in [Touretsky 84].

With respect to conventional logics, the source of difficulty with inconsistency can be traced to the so-called paradoxes of implication, such as

AD(BDA) which can be paraphrased as "anything implies a true proposition", or

(AA...,A ) D B,

"a contradiction implies anything". One way to eliminate these undesirables is to modify the axiom set and revise the notion of proof so that a proof of B from hypotheses Ai, A 2, ... ,A n is well-formed only if it 21

actually uses each hypothesis in some step. Thus proofs are well-formed, according to this proposal, only if each hypothesis is redevant to the conclusion of the proof. [Anderson & Belnap 75] provide a thorough study of such relevance logics. As mentioned earlier, Levesque's formulation of a situational semantics for belief uses similar ideas and ends up with a notion of entailment that is the same as one of the relevance logics.

A novel proposal for treating inconsistency is described in [Borgida & Imielinski 84]. A knowledge base is treated as a collection of viewpoints held by members of a committee, where each viewpoint includes a consistent collection of facts. Derivability with respect to the knowledge base is then determined by means of a committee decision rule. Some examples of alternative derivation rules are:

KB ~L P if in each viewpoint V, V ~- p KB b-L p if in at least one viewpoint V, V ~- p and in no viewpoint V, V ~- -~/~

Note that the first is a very conservative definition of derivability. Bot~ definitions allow conflicting viewpoints among committee members without leading to contradictory knowledge bases. The proposal is shown to be capable of handling a variety of nonmonotonic phenomena, including default rules and data- base updates.

Another approach to the problem of assimilation of inconsistent information is the previously-cited four-valued logic described in [Belnap 75]. In this logic, besides having values for true and false, there are also values for unknown and inconsistent. So if a system was told from one source that Student(John) was true, and later informed by another source that ~Student(John) was true, the system could assign "inconsistent~ as the truth value of the statement. The approach then allows the explicit representation of, and hence ability to reason with, inconsistent information.

Turning to the issue of accommodation, one way to resolve the problem, indeed eliminate it alto- gether, is to treat "suspect" formulas (i.e. formutas that might be contradicted) as hypotheses. This point of view is adopted by [Delgrande 85] where it is assumed that facts in a knowledge base are of three different kinds: ground atomic formulas such as Student(John) Supervisor(John, Mary)

hypothesized general statements such as

"Elephants are hypothesized to be mammals" "An uncle is hypothesised to be a brother or husband of a sibling of a parent"

and arbitrary sentences presumed to be beyond refutation. Given this assumption, three issues are explored: first, how to generate and maintain the consistency among the hypothesised general statements, given ground atomic formulas and other statements; second, how to formally prescribe the set of general statements that may be hypothesised; and last, how such a hypothesis formation system might interact with a system for deductively reasoning with hypothesis and knowledge.

The problem of forming conjectures and maintaining consistency is treated less as an inductive infer- ence problem and more as a deductive, consistency-restoration problem. Simplistic criteria are used to form general hypotheses on the basis of the ground facts; these criteria however are not strong enough to ensure that standard relations hold among hypotheses. However it is shown how consistency may be deductively restored by the means of determining the truth value of knowable but unknown ground 22

instances, and reapplying the simplistic criteria to the expanded set of ground instances.

Thus for example if it was known that instructors in some university department with their Master's degree could supervise M,Sc. students, we might hypothesise that these groups are equivalent, say

H(x )[HasMSc(x) =- MScSup(x)]. (7)

In another department it may turn out that it is not inconsistent with what is known that supervisors of M.Sc. students can also supervise Ph.D. students:

H(x)[MScSup(x) -~ PhDSup(x)]. (8)

However there is, as yet, no reason (i.e. common satisfying individuals) to conjecture the transitive equivalence

H(x)[HasMSc(x) = PhDSul~x)]. (9)

Clearly though if the individuals known to satisfy (7) were determined to satisfy PhDSup(x), then we would have reason to hypothesise (9), If one of these individuals was determined to not satisfy PhDSup(x), then this individual would also falsify (8), and so we would obtain

H(x)[PhDSup(x) D MScSup(x)]

H(x)[PhDSup(x) D HasMSc(x)] and so in any case consistency would be restored.

Another approach to the issue of accommodation due to [Borgida 85] takes the view that general laws are useful and should be available in a knowledge representation framework, along with a mechanism for accommodating exceptions. Consider, for example, a statement such as

"Before admission to the hospital, a person must present his hospital insurance number"

This statement cannot be treated as a default rule because then it has no force. At the same time. it cannot be treated as a universally quantified constraint because it is obvious that it will be violated in individual cases (e.g., during the admission of a VIP to the hospital) as well as in whole classes of cases (e.g_ in emer- gency situations where the person being admitted is in no position to worry about his hospital insurance number). Borgida's proposal treats the introduction of an exception as a composite operation which includes a modification of the general statement, Thus if John is admitted to the hospital under emergency conditions and it is decided to delay enforcement of the constraint specified above, the constraint is revised to read

"Before admission to the hospital, a person must present his hospital insurance number or the person is John"

Thus at any one time, a general formula is thought of as having a given number of exceptions which wel*e introduced in the knowledge base after permission was granted. One desirable feature of this mechanism is that reasoning can be done in first-order logic.

In a somewhat different vein, truth rnaintenar~ce systems or reason maintenance systems have been proposed for revising sets of beliefs. In this approach the reasons or justifications for holding a particular belief are recorded along with the belief, If the belief is later found to be false, then the justifications can 23

be examined in an attempt to restore consistency. An early implementation of these ideas is [Doyle 79]. The notion of recording which formulas are relevant in deriving a proof sounds much like the approach taken in relevance logics and it would seem that such work may be useful for belief revision systems. This indeed is the case, and recent work by Martins and Shapiro, described in [Martins and Shapiro 84], uses a relevance logic of Anderson and Belnap as the basis of a formal framework for a belief revision sys- tem.

3.4. Inaccuracy

A knowledge base may contain information which is assumed to be true when in fact it is not. For instance, we may have

Student(John)

Supervisor(John, Mary)

when actually John isn't a student (he was last year) and/or he is not supervised by Mary (the data entry clerk make a typing error in entering this fact in the knowledge base). We call such a knowledge base inac- curate since it does not provide an accurate description of the world being modelled. Inaccuracy, like incompleteness and inconsistency, has to be treated as a fact of life in any large knowledge base, either because external sources of information may be incorrect, or because unintended coding errors may occur in adding the information to the knowledge base, or because the knowledge base isn't updated property. It is important to note that inaccuracy is a different notion from inconsistency. For example, if a knowledge base knows that a person's age is between 0 and 120, and it is asserted that John's age is 134, when in fact it is 34, then the resultant knowledge base will be both inconsistent and inaccurate. But claiming that John's age was 43, when in fact it is 34, leads to an inaccurate but consistent knowledge base.

What can be done about inaccuracy? Well, as with other features of knowledge, at the very least we would like to be able to talk about it. We would like to be able to state, for instance,

(1) "An entity asserted to be a student in the knowledge base is, in fact, a student in the domain of discourse", which asserts that the knowledge base has accurate knowledge with respect to studenthood, or

(2) "A person's age as recorded in the knowledge base may be off by up to two years".

One way to achieve this is through the use of a modal operator, such as the operator K discussed earlier. Thus. to assert (1) we can write

(x)[KStudent(x) D Student(x)] while (2) can be asserted with

(x)(y)[K(age(x)=y) D Irealage(x)--yf ~ 2].

Note that if we further assume that 24

(3) KaDa then an inaccurate knowledge base must also be inconsistent. Thus (3) has the undesirable consequence that accuracy of the knowledge base is legislated, and so cannot be discussed, constrained, or asserted -- in contrast to (1). This point is discussed further in [Levesque 81, pp 2-5].

The careful reader will note that there is nothing about the K operator that is specific to (in)accuracy. This operator simply allows us to talk about statements in the knowledge base, and this capability makes it useful in the treatment of incompleteness, inaccuracy, and other features of encodings of knowledge. This suggests that other mechanisms which allow one to talk about statements in the knowledge base should also be suitable for talking about inaccuracy. This is indeed the case, and we'll discuss one such mechanism due to [McCarthy 79], which provides capabilities comparable to those provided by the K operator, but in a first-order logical setting.

McCarthy's point of departure is to treat concepts such as that of John or Mary as objects in a first- order theory, thus bringing them into the domain of discourse. In order to relate a concept (e.g., John) to the entity denoted by the concept (e.g.. the real person john) McCarthy uses a denotation function denot so that

denot(John )=john.

Assuming that symbols beginning with a capital letter denote concepts, or functions or predicates over con- cepts, while symbols beginning with lower case letters denote entities, or functions or predicates over enti- ties, in the domain of discourse, we can now write

(X)[Student(X) D student(denot(X))] to assert that student concepts in the knowledge base denote students in the domain of discourse, while

(X)(Y)[Age(X)=Y D I realage(denot(X))-denot(Y) I ~ 2] asserts that the age of a person stored in the knowledge base may be inaccurate by up to two years.

A comparable approach is used in [Konolige 81] to describe the contents of a relational database. Here the role of the knowledge base is played by the relational database and Konolige uses a predicate DB which takes as arguments encodings of statements with respect to the database, and returns true or false depend- ing on whether the statement is true or false with respect to the database. For example, if f is the encoding of the statement

(t/SHIPR )[sname(t )=LAFAYETTE V length(t) > 300] then DB(f) is true if and only if every tuple in the relation SHIPR has shame attribute equal to LAFAY- ~7"TE or its length attribute is greater than 300. Of course, since DB represents truth in the database, it has to satisfy axioms such as

(/)[DB(~/) = ~DB(/)]

(f)(g)[DB(f A g) = DB(f) A DB(g)] etc. In addition, a denotation function comparable to McCarthy's is used to talk about the denotations of database terms. It is shown that this machinery is adequate for answering questions about the domain of 25 discour~, given a database and a set of axioms that describe its semantics, and also for the expression of incompleteness in the database.

3.5. Relativity

Yet another important feature of knowledge and belief is that it is relative to an agent. Different agents have different, possibly inconsistent beliefs about a domain of discourse. Moreover, they have beliefs about each others' knowledge and belief as well as their own. Consider for example the Wise Man Puzzle, as given in [Konolige 82]:

A king wishing to know which of his three wise men is the wisest, paints white dots on each of their foreheads, tells them that at least one spot is white, and asks each to determine the colour of his own spot. After a while the wisest announces that his spot is white, reasoning as follows: "Suppose my spot were black. The second wisest of us would then see a black and a white and would reason that if his spot were black, the least wise would see two black spots and would conclude that his spot is white on the basis of the king's assurance. He would have announced it by now, so my spot must be white. ')

Konolige formalises the problem in a propositional modal logic consisting of the propositional calculus, together with a set Sp of modal operators, where for SE Sp the intended meaning of [S]~ is that agent S knows a. If Pi is the proposition asserting that the i ~h wise man has a white spot on his forehead, and ~S~p abbreviates [Sip V [S]~p. then some of the initial conditions of the puzzle are:

(1). p1Ap2Ap3 (2) [o](plvp2vpj) (3) [o](Irsl]~v2 ^ ~s~]p3 ^ ns21lv~ A lrs2~p3 A [~sj~p~ A ~sj)w )

The first axiom states the actual situation. The second states that it is common knowledge that someone has a white spot on his forehead. The third axiom says that it is common knowledge that each can see the spots of the others. Further axioms are used to fully specify the problem, including a circumscriptive axiom stating that S I has sufficient knowledge to solve the problem. From this formulation it is proved that $1 knows the colour of his spot.

Konolige's approach also altows a certain flexibility in representing nested beliefs and belief systems [Konotige 84]. For example, John may have one set of beliefs and deduction rules, while Mary has another. John however also may have beliefs about Mary's beliefs and rules of inference. Such belief systems and belief subsystems may be of varying power and capabilities. So if John is reasoning about Mary's beliefs. his reasoning is "filtered )' through Mary's perceived beliefs and deduction rules.

Fagin and his co-workers present a general model for reasoning about this sort of knowledge for a set of agents [Fagin et al 84], Instead of an extension to possible world semantics for an agent's belief as might be expected, they present a model based on a notion of "knowledge levels". Each level corresponds to an iteration of the ~knows~ operator, or to a level of recta-knowledge. For example, assume that level zero, where the domain itself is described, contains just the sentence

Student( Bffl ) (10)

The first level gives each agent's knowledge about the domain. So perhaps John knows that (10) is true 26

while Mary does not:

K lohnStudent (Bill )

",KM~Student(Bill) A -~Km~rj~Student(Bill).

The second level contains each agent's knowledge about the other agent's knowledge about the domain. So Mary may know that John knows the truth value of (10), while John may not know whether Mary knows whether (10) is true or false:

K Ma~y(K TohnStudent( Bill) V K joh~..Student ( BilY) )

~K john(K-MaryStuderff( Bill) V K Mary-,Studerg( Bill) ).

Since an agent's self-knowledge is assumed to be complete and accurate, we also obtain sentences such as

K johnKjoh~Stud ent( Bitl ),

stating that John knows that he knows that Bill is a student.

McCarthy's proposal for treating knowledge, belief, and related concepts from within first-order logic, discussed in the previous section, also applies to reasoning about other agents' beliefs. So,

Know(A,P)

is a proposition meaning that agent A knows the value of concept P, while

true Know(A.P)

asserts the truth of the proposition. Thus we can assert that John knows whether Mary knows Bill's tele- phone number by:

true Know(John, Know(Mary, Telephone Bill)).

It may seem to the reader that knowledge about other agents is simply a generalisation of self- knowledge. Surprisingly, this is not quite true. Consider an example (paraphrased from [Levesque 81]) where the knowledge base is told that:

( 3 x ) Kn o w( John. "Teacher(x)")

which asserts that there is an individual known by John to be a teacher. Replacing John by the agent itself, we have

(3x)Know(KB, "Teacher(x)") which is either trivially true (if there is, in fact, a teacher in the knowledge base) or meaningless (if there isn't a teacher in the knowledge base, then one can't teR the knowledge base that it knows otherwise), The conclusion that can be drawn from this example is that there are statements about an agent's knowledge which make sense as long as they are not statements of self-knowledge.

Another way in which knowledge about other agents is not a simple generalisation of self-knowledge is that in some logics it is substantially more complex to reason about a number of agents, that it is to rea- son about a single agent. This topic is addressed in [Halpern and Moses 85]. For logics based on $5 27

(roughly, where one knows about one's own knowledge and ignorance) the problem of showing whether a formula is satisfiable belongs to complexity class NP for a single agent, but to the next complexity class PSPACE for multiple agents. This means that it is quite likely that the problem of satisfiability for multi- ple agents is substantially more difficult than for a single agent, Somewhat surprisingly, the problem of satisfiability in the case of multiple agents becomes yet more complex if operators for common knowledge are added. That is, operators E and C are added, where Ec~ is read as "everyone knows a" and C~ abbrevi- ates Ea A EE~ A -. - and is read as "c~ is common knowledge ~. [Halperen and Moses 85] show that for a general set of logics the problem of satisfiability moves from PSPACE to deterministic exponential time EXP when axioms for E and C are added.

In summary, adoption of a relativist viewpoint means that it is no longer possible to assume that every statement about the domain of discourse is either true or false. Indeed, the notion that there is a unique domain of discourse (~God's point of view", if you like) is abandoned in favour of a subjective real- ity. It is interesting that ~useful" knowledge bases developed so far ignore relativism and assume that for a particular application one can construct an objective account by piecing together personal viewpoints.

3.6. Uncertainty

The next feature of~knowledge we witt examine is concerned with the degree of confidence an agent has in the truth of a particular fact in its knowledge base. Each fact then has associated certainty infor- mation which indicates the degree of this confidence. The notions of "certainty" and "confidence" however have proved very difficult to formalise and consequently most of the measures that have been used for the degree of this confidence have been quantitative (rather than qualitative).

The basic idea behind such measures is to provide a function uric from propositions to real numbers such that uric(p) indicates the certainty the KB has in the truth of proposition p. Hence if p is more certain than q then

unc( p ) >turic(q).

Any approach to providing such a measure must address two questions: first, how are the measures to be updated in the light of new evidence, and second, how does one choose among the various possibilities, given the propositions and certainty values?

The traditional approach is probability theory which, until recently, has provided the best-developed mathematical framework for dealing with uncertainty. Let H be a finite set of propositions, closed under negation and conjunction, and assume that ~ and I denote respectively the inconsistent and true proposi- tions in the set II, A probability measure P defined over II, intended to represent the certainty (or proba- bility or plausibility or credibility) of a proposition, is a function from H to [0,1] such that

(1) P(o) = o

(2) P(I) = 1

(3) P(pvq) -- P(p)+P(q) if pAq =

Traditional probability theory has been perceived however as having a number of difficulties. As we 28

shall see though, there is no general consensus as to the validity of these claims. One such drawback is that it is very difficult in general to establish a P function for a particular set of propositions. A second perceived drawback is that the above formulation of uncertainty has the property that

P(g)+P(~q) = I (11)

which means that whatever uncertainty is missing with respect to a proposition q must be attached to its complement --~. It follows that there is no room in this framework for ignorance in the certainty of a pro- position and its complement. Thus, even if we know nothing about John being or not being a millionaire, for

q ~ John is a millionaire,

we are forced to have (11) hold.

A fundamental issue that needs to be addressed in selecting an uncertainty function for a set of pro- positions is how the uncertainty function should be constrained by propositions that are logically or proba- bilistically related. A solution to this issue is provided by Bayes's rule:

P(~)XP(n. n 2 ..... n. In) Pff-II&,E2 ..... E.)= _V(E~. E2 ..... E.)

where t'(HIE 1..... E.) is the conditional probability of H given E 1..... E., i.e.. the probability that H is true given that E t ..... E n are all true. Unfortunately. this formula is highly impractical to use in a real- istic setting because P(E1 ..... E n) and P(EI ..... E. IH) are usually very difficult to determine; moreover the formula leads to severe combinatorial problems. (Consider for example the number of P vatues that would have to be calculated, somehow, for n = 10 and each E taking two possible values.) In order to over- come these problems, two simplifying assumptions are usually made. Firstly, the events E i are assumed to be statistically independent, in which case

P(EI ..... E.) = P(E~) x ~n2) x . .. xP(E.).

This drastically reduces the number of P values that needs to be estimated. Unfortunately however the assumption of statistical independence is usually false. Secondly, it is assumed that statistical indepen- dence between the Ei continues to hold given H, i.e.,

P(E~ ..... n. IH) = P(E~ ~H) x - • - x P(n. ~n).

An appealing result of these simplifications is that the conditional certainty in H, given i pieces of evidence, is a linear function of the certainty in H given (i--1) pieces of evidence and the certainty in the i t~ piece of evidence:

P(H Jn~ ..... n.) = P(n l n. . . . , n._~) x [P(n. IH) / P(n~)].

This method of calculating uncertainty forms the basis of the reasoning component for PROSPECTOR [Duda et al 78] and provides evidence that simplifying assumptions can sometimes be a positive step towards building practical systems.

Many alternatives to the above formulation of uncertainty relax condition (3) so that 29

(3") P(p)+P(-p) ~< 1.

The Dempster-Shafer theory ([Dempster 68]) proposes such an alternative. Here the basic means for assigning and manipulating degrees of belief is a mass function M which represents a basic probability assignment to all possible propositions. From this basic assignment, the support of a proposition p is defined by

sup(p) = ~'.M(q) over {qDp}.

Thus the mass oLevery proposition that implies p contributes to the support of p. The plausibility of p is defined by:

pzs(v) = 1-s,,p(~p).

It is easily shown that sup(p)~pls(p). The confidence in a proposition p then is defined as the interval given by the support and plausibility:

con/(p) = [s"v(e), #~(v)].

Thus the proposition p denoting "John is a millionaire~ and its negation might be assigned confidence [0, O] if nothing is known about the matter. If, on the other hand, it is certain that John is not a millionaire, then

conf(p) = [0,1] while

~onf(-~p) = [1,1].

The Dempster-Shafer theory also indicates how two individual mass functions (engendered perhaps from disparate evidence reports) can be combined to yield a single mass function. [Dubois & Prade 85] examine this rule for combining mass functions in typical situations. They show, for example, that suitably res- tricted, the rule is equivalent to that used in the MYCIN system [Shortliffe 76] for combining two non- conflicting pieces of evidence. They show also though that in some cases the rule may be quite sensitive to small changes in the values of the evidence reports: in some cases a change from a claim that p is impossible (i.e. conf(p)= [0,1]) to one that p is highly improbable (i.e. perhaps conf(p)= L0001, .9999]) can yield quite different results when combined with other evidence reports. This problem (if indeed it is a problem at a11) however is one common to all such approaches, including traditional probability theory.

The theory has also been extended to allow for the calculation of the confidence of togical combina- tions of propositions [Lowrence 82]. For example, if pDq then

~on[(p) = [o,pz~(q)]

co,~Rq) - [me(r), 1]

Such rules offer an alternative to the Bayes" rule and essentially replace statistical dependence concerns with logical dependence ones among the evidence and the conceivable hypotheses for a given setting.

A difficulty with the Dempster-Shafer approach is that, unlike Bayesian analysis, it is not clear how one may arrive at a decision about which proposition to hold [Thompson 85]. While support and 30

plausibility values may be used as bounds on the probability of a statement, there is no accepted mechan- ism for selecting among propositions that may have overlapping intervals or intervals differing in size.

However here, as elsewhere, there is no clear agreement as to which of these approaches is preferable. or even if recent proposals are tending in the right direction. So while there has been much recent activity and interest in approaches such as Dempster-Shafer, there is still much interest in investigating more tradi- tional avenues. [Cheesman 85] for example presents a forceful and spirited defense of standard, classical probability theory. In fact, he claims that much of the present work in AI, including default and non- monotonic reasoning systems, fuzzy logics, and the inferential apparatus of many expert systems, is founded on misinterpretations of probability theory. That is, the crucial concept is that of the conditional probability of a proposition, which is the measure of an entity's belief in that proposition, given a particu- lar set of evidence. This notion differs sharply from the commonly held frequency definition of proba- bility, as being the ratio of the number of occurrences in which an event is true to the total number of such occurrences,

With respect to uncertainty, Cheesman argues that "extended" approaches, such as those of Dempster-Shafer or Lowrence, can be handled classically. An example cited is that of a box that simply outputs a string of decimal digits, and where we are asked the probability that the next number is 7. With no further information we would say .1. If we subsequently examined 1,000,000 digits, and 100,000 were 7 and in no apparent order, we would still say .1. The difference of course is that we would be much more confident in the second prediction. However to deal with such notions of confidence, extended notions such as significance, plausibility, etc. aren't necessarily required; rather the changed expectation can be captured simply as the standard deviation or, more generally, with a probability density function.

Finally, some AI systems, notably expert systems, employ techniques for updating certainties that differ significantly from those just described. The major difference is that such systems can use and take advantage of large quantities of domain-specific information. A good example is given in [Tsotsos 81]. Here the idea is that certainty information is attached to complex objects, called hypothe, se~. Hypotheses are arranged by means of their conceptual adjacency. In updating certainties, a particular hypotheses will be supported by some hypothesis while others witl conflict with it. Thus for example, the hypothesis that

)'The object under consideration is a station wagon" supports the hypothesis that

"The object under consideration is a car", but conflicts with

"The object under consideration is a bicycle".

Certainty values are updated simultaneously using relaxation labelling [Zucker 78]. If a hypothesis has a high certainty value, then it will tend to increase the certainty value of those that it supports, and decrease the certainty value of those that it conflicts with. To provide for a damped convergence to a solu- tion, techniques from control theory are also employed. While this approach provides a feasible means of dealing with a collecT;ion of certainty values, as mentioned, it does require a pr/ari knowledge of the domain under consideration. A related approach is also described in [Khan and Jain 85]. 31

3,7, Imprecision

Apart from uncertainty in the truth value of a proposition, there is also the issue of the contents of a proposition being imprecise. For instance, asserting that

"John was born in 1956" is imprecise in that we are not told exactly when in 1956 John was born. Likewise~

"John is very young" "Most Swedes are blond" "George is bald* are imprecise with respect to the age of John, the proportion of blond Swedes [Prade 85], and the degree of George's baldness. It is important to emphasize that imprecision and uncertainty are orthogonal notions, We may be absolutely certain that John is young but only have imprecise information about how young he is. Conversely, we may be uncertain about very precise propositions such as "the area of a circle is 7r times the square of its radius".

A popular way of dealing with imprecision involves the notion of fuzzy sets [Zadeh 75]. These are sets defined by a membership function # which ranges over the full interval [0, 1] instead of being just binary. A proposition of the form

~X is A"

(e.g., *George is bald") is thought as describing X's membership in a fuzzy set SA. For example, the fuzzy set SBALD may have a membership function/ZSAZD such that

l~BAz~(George) = 0.9

tZBALD(Mary) = 0,05.

In [Zadeh 83], this simple account is extended to show how one can represent the meaning of statements such as "Most Swedes are blond", given a (fuzzy) world which includes information on the hair colour of Swedes, the nature of blondness as a function of hair colour, and the ratio of true instances of "Swedes are blond" which would satisfy the quantifier "Most". Zadeh calls his method test score semantics and argues that it constitutes a generalization of other types of semantics such as the Tarskian and possible worlds semantics discussed earlier.

In addition to fuzzy sets, probability functions (or probability distributions) can also be used to represent imprecision. For instance, we can think of

"John is very young" as defining a probability function, ~r, for the age of John. Then if SAGe is the set of all age values, It(s) specifies the probability of John's age being s. Presumably, ~r must assign larger values to younger ages and in addition,

~r(s) = 1 for all seSAGg.

[Prade 85] provides a thorough account of the use of this machinery for the representation of imprecision. The methods of fuzzy sets have been extended in a number of directions and attempts have been made to apply them in other ways in representing common sense knowledge. Despite these attempts, there doesn't 32 appear strong support for the use of fuzzy se~s in the representation of anything but imprecision in meas- ure spaces [Hayes 79]. [Osherson & Smith 81] also is a critique of accepted views of imprecision as they bear on intuitions concerned with the combination of concepts to form complex concepts.

4. Conclusions

Clearly there is no single, complete set of features of knowledge. In this paper we have attempted to identify some features that are of interest to researchers in Knowledge Representation and to sketch some of the approaches that have been used to formalize and study them. There are other features one may want to examine. We know. for example, that informal specifications, including comments attached to a program, graphical sketches of the overall structure of a system, and natural language accounts of require- ments for a piece of software, are all time-honoured and accepted practices for representing knowledge about a program. Would we have a more powerful knowledge representational framework if it could han- dle informality? Likewise, we may want to be able to talk about the significance or insignificance of an item in a knowledge base, or its relevance or irrelevance to the knowledge base. The reader may want to add his own list of features of knowledge to what has been presented or mentioned so far.

If there is a common direction or theme to the work reviewed here, it is the continuing concern with formality. This is indicated by the emphasis on formal logics, both as a tool for representing knowledge and as a tool for the analysis of knowledge. While the paper its'elf has emphasised formal approaches, it no~aetheless appears that many researchers are concerned with investigating the fundamental, foundational properties of knowledge. This certainly is to be expected, given that many of the issues are only now beginning to be fully understood and explored in At.

It is interesting that in the discussion of mechanisms for handling the different features of knowledge, we turned several times to the same mechanisms for help. Nonmonotonic reasoning, modal operators, and the availability of a metalanguage that allows one to treat propositions in the knowledge base as entities within the domain of discourse, are three such mechanisms. It is significant that by and large there has been little interest in embedding such mechanisms in a representational framework, but understandably so since these mechanisms are still under development.

This paper provides an admittedly brief and subjective overview of some issues concerning the nature of knowledge, There is no claim that either the list of issues or the list of references given for each one is exhaustive. We do hope however, that we have helped the reader with background in Computer Science but little in Artificial Intelligence appreciate some of the deeper issues that need to be addressed if one is to call the information handled by his system "knowledge" and the data structures storing this information "knowledge bases".

References

American Association for Artificial Intelligence, Non-Monotonic Reasoning Worksho2, New Paltz, New York, Oct. 1984

A.R. Anderson and N.D. Belnap Jr., Entailment: The Logic of Relevance and Necessity, VoL I, Princeton University Press, 1975. 33

D. Angluin and C.H. Smith, "A Survey of Inductive Inference: Theory and Methods". Technical Report 250, Department of Computer Science, Yale University, 1982.

A. Barr and J. Davidson, "Representation of Knowledge", Stanford Heuristic Programming Project, Memo HPP-80-3, Stanford University, 1980.

N.D. Belnap, "A Useful Four-Valued Logic" in Modern Uses of Multiple-Valued Logic, J.M. Dunn and G. Epstein eds., D. Reidel Pub. Co., 1975.

W. Bibet, "First-Order Reasoning About Knowledge and Belief", ATP-21-IX-83, Technical University of Munich, 1983.

A. Borgida and T. Imielinski, "Decision Making in Committees -- A Framework for Dealing with Incon- sistency and Non-Monotonicity", Workshop on Non-Monotonio Reasoning, New Paltz, 1984.

A. Borgida, "Language Features For Flexible Handling Of Exceptions In Information Systems", Transactions on Database Systems, to appear.

R.3. Brachman. "On the Epistemological Status of Semantic Networks", in Associative Networks: Represen- tation and Use of Knowledge by Computers, N.V. Findler (ed.), Academic Press. 1979, pp 3-50.

R.J. Brachman and H.J. Levesque, "Competence in Knowledge Representation" Proc. AAA1-82, Pittsburgh, 1982, pp 189-192.

R.J. Brachman and H.J. Levesque, "The Tractability of Subsumption in Frame-Based Description Languages" Proc. AAA1-84, Austin, 1984, pp 34-37.

R.J. Brachman and H.J. Levesque (eds.), Readings in Knowledge Representation, Morgan Kaufmann Pub- lishers, Inc., 1985

R.3. Brachman and B.C. Smith (eds.), Special Issue on Knowledge Representation, SIGART Newsletter No. 70, Feb. 1980.

P. Cheeseman, "In Defense of Probability", Proc. IJCAI-85, Los Angeles, 1985, pp 1002-1009

J.P. Delgrande, "A Foundational Approach to Conjecture and Knowledge in Knowledge Bases". Ph.D. Thesis. Department of Computer Science, University of Toronto, 1985.

A. P. Dempster, "A Generalization Of Bayesian Inference", Journal of the Royal Statistical Society, Vol. 30. pp 205-247. 1968.

J. Doyle, "A Truth Maintenance System", Artificial Intelligence 12, 1979, pp 231-272.

J. Doyle and P. London, "A Selected Descriptor-Indexed Bibliography to the Literature on Belief Revision". SIGART Newsletter #71, Apr. 1980, pp 7-23.

F.I. Dretske. Knowledge and the Flow of Information, Bradford Books, the MIT Press, 1981.

D. Dubois and H. Prade. "Combination and Propagation of Uncertainty with Belief Functions", Proc. 1JCA1-85, Los Angeles. 1985, pp 111-113

R.O. Duda, P.E. Hart, N.J. Nilsson, and G.L. Sutherland, "Semantic Network Representations in Rule-Based Inference Systems", in Pattern-Directed Inference Systems, D.A. Waterman and F. Hayes-Roth eds.. Academic Press, 1978.

D.W. Etherington, R.E. Mercer, and R. Reiter, "On the Adequacy of Predicate Circumscription for Closed- World Reasoning". Computational Intelligence, Vol. 1, No. 1, 1985. pp 11-15.

D.W. Etherington and R. Reiter, "On Inheritance Hierarchies with Exceptions", Proc. AAAI-83, 1983, pp 104-108.

R. Fagin. J.Y. Halpern, and M.Y, Vardi, "A Model-Theoretic Analysis of Knowledge: Preliminary Report". Proceedings of the Twenty-Fifth IEEE Symposium on Foundations of Computer Science, Florida, 1984. 34

S.E. Fahlman, NETL: A System for Representing and Using Real-World Knowledge, MIT Press, 1979.

I. Gotdstein and S. Papert, "Artificial Intelligence, Language, and the Study of Knowledge", Cognitive Sci- ence, Vol. 1, No. 1, 1977.

N. Goodman, Fact, Fiction and Forecast, 3rd ed., Hackett Publishing Co., 1979.

R.F. Hadley, "Two Solutions to Logical Omniscience: A Critique with an Alternative", TR 85-21, School of Computing Science, Simon Fraser University, B.C., 1985

J.Y. Ilalpern and Y.O. Moses, "A Guide to the Modal Logics of Knowledge and Belief: Preliminary Draft". Proc IJCAI-85, Los Angeles, 1985.

P. J. Hayes, "Some problems and Non-Problems in Representation Theory', l~oceedings AISB Summer Conference, 1974, pp 63-79.

P.J. Hayes. "In Defense of Logic", Proc. 1JCA1-77, Cambridge, 1977. pp 559-565.

P. J. Hayes, "The Naive Physics Manifesto", Machine lntelligenoe 9, D. Michie (ed,), Edinburgh University Press, 1979, pp 243-270.

C. Hewitt, "PLANNER: A Language for Proving Theorems in Robots", Proceedings IJCA1-71, London, 1971.

J. Hintikka, Knowledge and Belief: An Introduction to the Logic of the Two Notions, Cornell University Press, 1962.

G.E. Hughes and M.J. Cresswell, An Introduction to Modal Logic, Methuen and Co., 1968.

D.J. Israel and R.J. Brachman, "Distinctions and Confusions: A Catalogue Raisonne", Proceedings of the Seventh International Conference on Artificial lntelligenoe, Vancouver, B.C., 1981, pp 252-259.

N.A. Khan and R. Jain, "Uncertainty Management in a Distributed Knowledge Based System", Proc. 1JCA1-85, Los Angeles, 1985, pp 318-320

K. Konolige, "A Metalanguage Representation of Relational Databases for Deductive Question-Answering Systems", Proceedings of the Seventh International Conference on Artificial Intelligence, Vancouver, B.C., 1981, pp 496-503.

K. Konolige, "Circumscriptive Ignorance", Proc. AAAI-82, Pittsburgh, 1982

K. Konolige, "A Deductive Model of Belief", Ph.D. Thesis, Department of Computer Science. Stanford University, 1984.

B. Kramer and J. Mylopoulos. '*Knowledge Representation: Knowledge Organization". to appear.

G. Lakemeyer, Internal Memo, Department of Computer Science, University of Toronto, 1984.

H.J. Levesque, "A Formal Treatment of Incomplete Knowledge Bases", Ph.D. thesis. Department of Com- puter Science, University of Toronto, 1981.

H.L Levesque, "A Logic of Implicit and Explicit Belief~, Proc. AAA1-84, Austin, 1984.

D. Lewis, Counterfactuals, Harvard University Press, 1973.

J, D. Lowrence, "Dependency-Graph Models of Evidence Support". COINS technical report 82-26. Univer- sity of Massachusetts at Amherst, 1982.

G. McCalla and N. Cercone (eds.), 1EBB Comtmter (Special lssue on Knowledge Representation) Vol. 16, No. 10, October 1983.

J. McCarthy, "First Order Theories of Individual Concepts and Propositions". in Machine Intelligence 9, D. Michie (ed.), Edinburgh University Press, 1979, pp 129-147. 35

J. McCarthy, "Circumscription -- A Form of Non-Monotonic Reasoning", Artificial inteUigence 13, pp 27- 39, 1980.

J. McCarthy, "Applications of Circumscription to Formalizing Common Sense Knowledge", Non-Monotonic Reasoning Workshop, New Paltz, New York. 1984, pp 295-324,

J. McCarthy and P.J. Hayes, "Some Philosophical Problems from the Standpoint of Artificial Intelligence", in Machine Intelligence 4, D. Michie and B. Meltzer (eds.), Edinburgh University Press, 1969, pp 463-502.

J.P. Martins and S.C. Shapiro, "A Model for Belief Revision", Non-Monotonic Reasoning Workshop, New Paltz, 1984,

D. McDermott, "The Last Survey of Representation of Knowledge", Proceedings AISB/GI Conference, 1978, 206-221.

D. McDermott, "Monmonotonic Logic II: Nonmonotonic Modal Theories" JACM 29, 1, 1982, pp 33-57

D. McDermott and J. Doyle, "Non-Monotonic Logic I", Artificial Intelligence 13, 1980, pp 41-72

M. Minsky, "A Framework for Representing Knowledge" in The Psychology of Computer Vision, P.H. Wins- ton (ed.), McGraw-Hill, 1975, pp 211-277.

R. Montague, Formal Philosophy, Yale University Press, 1974.

R.C. Moore, "Reasoning About Knowledge and Action", Technical Note 284, Artificial Intelligence Centre, SRI International, 1980.

R.C. Moore, "Semantical Considerations on Nonmonotonic Logic", Proc. IJCA1-83, Karlsruhe, 1983, pp 272-279.

R.C. Moore and G. Hendrix, "Computational Models of Beliefs and the Semantics of Belief-Sentences", Technical Note 187, SRI International, Menlo Park, 1979.

J. Mylopoulos and H.J. Levesque, "An Overview of Knowledge Representation" in On Conceptual Modelling, M.L. Brodie, J. Mylopoulos, and J.W, Schmidt (eds.), Springer-Verlag, 1984.

A. Newell, "The Knowledge Level", A1 Magazine 2(2), 1981, pp 1-20.

D.N. Osherson and E.E. Smith, "On the Adequacy of Prototype Theory as a Theory of Concepts", Cognition 9, 1981, pp 35-58.

M. Papalaskaris and A. Bundy, "Topics for Circumscription", Non-Monotonic Reasoning Workshop, New Paltz, New York, 1984, pp 355-362.

H.E. Pople, "On the Mechanisation of Abductive Logic", Proceedings of the Third International Conference on Artificial Intelligence, Stanford, Ca., 1973, pp 147-152.

H. Prade, "A Computational Approach to Approximate and Plausible Reasoning with Applications to Expert Systems", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 7, No. 3, May 1985.

H. Putnam, "Is Semantics Possible?" in Mind, Language and Reality: Philosophical Papers Volume H, Cam- bridge University Press, 1975, pp 215-271.

H. Putnam, "The "Corroboration' of Theories", in Mathematics, Matter, and Method: Philosophical Papers Volume 1, 2rid ed., Cambridge University Press, 1979, pp 250-269.

W.V.O. Quine and J.S. Ullian, The Web of Belief, 2rid ed., Random House, 1978.

R. Reiter, "On Closed World Data Bases", in Logic and Databases, H. Gallaire and J. Minker eds., Plenum Press, 1978.

R. Reiter, "A Logic for Default Reasoning", Artificial Intelligence 13, 1980, pp 81-132, 36

E. Rosch, "Principles of Categorisation" in Cognition and Categorisation, E. Rosch and B.B. Lloyds eds., Lawrence Erlbaum Associates, 1978.

I. Scheffler. The Anatomy of lnquiry: Philosophical Studies ~n the Theory of Science, Hackett Publishing Co.. 1981.

S.P. Schwartz (ed,), Naming, Necessity, and Natural Kinds, Cornell University Press, 1977.

E.Y. Shapiro, "Inductive Inference of Theories from Facts", Research Report 192, Department of Computer Science, Yale University, 1981.

S. Shapiro and R. Bechtel, "The Logic of Semantic Networks", TR-47, Department of Computer Science, Indiana University, 1976.

E.H. Shortliffe, Computer-Based Medical Consultation: MYCIN, American Elsevier, 1976.

T.R. Thompson, "Parallel Formulation of Evidential-Reasoning Theories" Proc, 1JCAI-85, Los Angeles, 1985, pp 321-327

D.S. Touretzky, "Implicit Ordering of Defaults in Inheritance Systems" Proc. AAA1-84, Austin, Texas, 1984, pp 322-325.

J.K. Tsotsos, "Temporal Event Recognition: An Application to Left Ventriculat Performance", Proc. 1JCA1- 81, Vancouver, 1981, pp 900-905

Y. Vassiliou, "A Formal Treatment of Imperfect Information in Database Management", Ph.D. Thesis, Department of Computer Science, University of Toronto, 1980.

W.A. Woods, "What's in a Link: Foundations for Semantic Networks" in Representation and Understand- ing, D.G. Bobrow and A. Collins eds., Academic Press, 1975.

L.A. Zadeh, "Fuzzy Logic and Approximate Reasoning", Synthese 30, 1975, pp 407-428.

L.A. Zadeh, "Commonsense Knowledge Representation Based on Fuzzy Logic", 1EEE Computer, Vol. 16, No. I0, October 1983.61-66.

S.W. Zucker, "Production Systems with Feedback", in Pattern-Directed Inference Systems, D,A. Waterman and F. Hayes-Roth eds., Academic Press, 1978. PART TWO

Knowledge Processing Deduction and Computation

Gdrard Huet

tNRIA and CMU

We present in a unified framework the basic syntactic notions of deduction and computation.

1 Terms and types

1.1 General notations We assume known elementary set theory and algebra. )4 is the set {0, 1, ...} of natural numbers, )/+ the set of positive natural numbers. We shall identify the natural n with the set {0 ..... n - 1}, and thus 0 is also the empty set 0. Every finite set S is isomorphic to n, with n the cardinal of S, denoted n = ISI. If A and B are sets, we write A --+ B, or sometimes B A, for the set of functions with domain A and codomain B.

1.2 Languages, concrete syntax

Let E be a finite alphabet. A string u of length n is a function in n ~ E. The set of all strings over ~ is

hE3/ We write ]u[ for the length n of u. We write ui for u(i - 1), when i < n. The null string, unique element of E0 is denoted h. The unit string mapping 1 to a E E is denoted 'a'. The concatenation of strings u and v, defined in the usual fashion, is denoted u " v, and when there is no ambiguity we write e.g. 'abc' for ~a' " 'b' " 'c'. When u E E* and a E E, we write u. a for u ~ 'a'. We define an ordering < on E*, called the prefix ordering, by

u

If u < v, the residual w is unique, and we write w = v/u. We say that occurrences u and v are disjoint, and we write ulv , iff u and v are unrelated by the partial ordering <_. FinMly we let u < v iff u

A88: - ,) - = - (,, -

5dL : h'u = u

5dR: u'h = u Actually, ~* is the free mono'/d generated by ~E. Examples. 1. ~] -- 0. We get ~* = 1. 2. ~ -- 1. We get ~.* = ~/. Strings are here natural numbers in unary notation, and concatenation corresponds to addition. 40

3. E = 2 = {0, 1} (the Booleans). The set E* is the set of all binary words. 4. ~] = N+. We call the elements of ~* occurrences. When u = w. m and v = w. n, with m < n, we say that u is Ieft of v, and" write u

1.3 Terms: abstract syntax

We first define a tree domain as a subset D of )4_~ closed under < and

uED A v

uED A v<~u =¢~ rED. We say that M is a :E-tree iff M E D --* ~2, for some tree domain D. We write D = D(M), and we say that D is the set of occurrences in M. M is said to be finite whenever D is. We shall now use occurrences to designate nodes of a tree, and the subtree starting at that node. If u E D(M), we define the ~-tree M/u as mapping occurrence v to M(u " v). We say that M/u is the sub-tree of M at occurrence u. tf N is also a El-tree, we define the graft M [u *-- N] as the ~-tree mapping v'to N(w) whenever v = u " w with w E D(N), and to M(v) if v E D(M) and not u < v. We need one auxiliary notion~ that of width of a tree. If M E E*, we define the (top) width of Mas IIMtl = max{n ]'n' E DCM)} We shall now consider ~ a graded alphabet, that is given with an arity function a in E --* ~/. We then say that M is a E-term iff M is a E-tree verifying the supplementary consistency condition:

Vu e DCM) IlM/ul[ = aCMCu))

That is, every subtree of M is of the form F(MI,M2,...,Mn), with n = a(F). We write T(E) for the set of ~-terms. If M1,M2,...Mn 6. T(D) and F e E, with a(F) = n, then M = F(MI, M2, ...Mn) is easily defined as a E-term. This gives T(E) the structure of a ~-algebra. Since conversely the decomposition of M is uniquely determined, we call T(~3) the completely free I2-algebra. Example With E = {+, S, 0}, a(+) = 2, a(S) = 1, a(0) = 0, the following structure represents a X]-term: /\ ÷ + S /\ t 0 8 8

0 0

The following proposition is easy to prove by induction. All occurrences are supposed to be univcrsally quantified in the relevant tree domain. 41

Proposition 1. Embedding: M[u ,-- N]/(u ~ v) -= N/v Assoeiativity : M[u ~-- N] [u ~ v ~-- P] = M[u *-- N[v ~-- P]]' Persistence: M[u ~-- g]/v = M/v (ulv) Commutativity : M[u ~- N] [v ~-- P] = M[v ~- P] [u ~-- N] (u]v) Dietributivity:M[u ~-- N]/v = (M/v)[u/v ~- N] (v < u) Dominance : M [u ~-- N] Iv ~-- P] -- M [v ~- P] (v < u)

We define the length IM[ of a (finite) term M recursively by:

IF(MI,...,M2)] = 1 + ~=IIM~]

1.4 Parsing It is well-known that the term in the example above can be represented unambiguously as a T-string, for instance in prefix polish notation, that is here: + + OSOSSO. This result is not very interesting: such strings are neither good notations for humans, nor good representations for computers, since the graft operation necessitates unnecessary copying. We shall discuss later good machine representations, using binary graphs. As far as human readibility is concerned, we assume known parsing techniques. This permits to represent terms, on an extended alphabet with parentheses and commas, which is closer to standard mathematical practice. Also, infix notation and indentation permit to keep in the string some of the tree structure more apparent. We shall not make explicit the exact representation grammar, anal allow ourselves to write freely for instance (0 ÷ S(0)) + S(S(O)). Note that we avoid explicit quotes as well, which permits us to mix freely recta-variables with object structures, like in S(M), where M is a recta-variable denoting a ~-term.

1.5 Terms with variables, substltution The idea is to internalize the notation S(M) above as a term S(x) over an extended alphabet containing special symbols of arity 0 called variables. Let V be a denumerable set disjoint from ~. We define the set of terms with variables, T(~., V), in exactly the same way as T(~. U V), extending the arity function so that a(x) =- 0 for every z in V. The only difference between the variables and the constants (symbol of arity 0) is that a constant has an existential import: it denotes a value in the domain we are modelling with our term language, whereas ~ variable denotes a term. The difference is important only when there are no constants in E, since then T(E) is empty. All of the notions defined for terms extend to terms with variables. We define the set N(M) of variables occurring in M as-

V(M) = {x e V I 3u e D(M) M(u) = x} and we define the number of distinct variables in M as P(M) = tV(M)I. We shall now formalize the notion of substitution of terms for variables in a term containing variables. From now on, the sets ~ and V are fixed, aald we use T to denote T(~, V). A subs~itn$ion a is a function in V --~ T, identity almost everywhere. That is, thc set D(a) = {x E V I a(x) ¢ x} is finite. We call it the domain of or. Substitutions are extended to nmrphisms over T by

a(F(M1, ..., M,.,)) -- F(a(M1), ...,a(M~)) 42

Bijective substitutions are called pe1"mutntions. When U C_ V, we writc cru for the restriction of substitution a to U. It is easy to show that, for all a, M and U: V(M) C_ U =~ a(M) = av(M).

Alternatively, we can define the replacement M [x ~- N] as

M [ul ~ N] ... [un ~ N] where {ul,...,un} = {~lM(u) = x} and then a(M) = M[x ~ a(x) l x e V(M)] with an obvious notation. We now define the quasi-ordering < of matching in T by: M <_N ¢~ 3a Y=a(M)

It is easy to show that if such a a exists, o'V(M) is unique. We shall call it the match of N by M, and denote it by N/M. We define M =- N ¢~ M < N A N < M. When M = N, we say that M and N are isomorphic. This is equivalent to say that M = a(N) for some permutation a. Note that M - .~/ implies [M[ = ]N[. Finally, we define M>N ¢~ N

Proposition. > is a well-ordering on T. Proof. We show that M > N implies #(M) > #(N), with #(M) = IMI - v(M).

Let p be any bijection between T x T and V. We define a binary operation f7 in T by: F(M1, ..., Mn) N F(NI, ..., Nn) = F(M1 n N1, ..., Mn n N,)

MAN = 9~(M,N) in all other cases. M N N is uniquely determined from p and, for distinct ~'s, is unique up to -.

Proposition. M V1 N is a g.l.b, of M and N under the match quasi-ordering.

Let T be the quotient set T~ -, completed with a maximum element T. From the propositions above we conclude:

Theorem. T is a complete lattice.

Corollary. If two terms M and N have an upper bound, i.e. a common instance a(M) = at(N), they have a l.u.b. M (3 N, which is a most general such instance: a = a0; v, a ~ = a~; T. The term M U N is unique modulo -= and may be foumt by the unification algorithm. Proposition. D(a(M)) = D(M) U U {u ^ vlveD(a(M(u)))} {ulMCu)eV} Vu E D(M) M(u) E V ~ a(M)/u " v = a(M(u))/v (v e D(a(M(u)))) M(u) e ~ ~ a(M)/u = a(M/u) 43

1.6 Graph representations~ dags We represent trees by binary graphs of adr pairs. An adr consists in one tag bit, and one byte field interpreted either as an address in the graph memory, or as a natural number. In this last case, the natural 0 is reserved for nil, the empty list of trees. Symbols from ~ are coded as positive naturals. If tree M is represented at the graph address adrl and the hst L is represented at address adr2, then the list M. L is represented by the graph node (M . L). Finally, the tree F(L) is represented by (F • L). This is the standard .way of representing trees and lists in the language LISP. A precise de- scription of the memory allocation implementation of such schemes is beyond the scope of these notes. Terms are of course represented as trees. A global table holds the arity function. There are several possibilities for the representation of variables. They may be represented as symbols. But then the scope structure must be computed by an algorithm, rather than being implicit in the structure. Also a global scanning of the term is necessary to determine its set of variables, and substitution involves copying of the substituted term. For these reasons, variables are often represented rather as integer offsets in stacks of bindings. Such "structure sharing" representations are now standard for PROLOG implementations. A precise account of the various representations schemes for term structures, and of the accom- panying algorithms, is out of the scope of these notes. It should be born in mind that the crucial problem is memory utilization: the trade-off between copying and sharing is often the deciding factor for an implementation. Languages with garbage-collected structures, such as LISP, are ideal for programming "quick and dirty" prototypes. But serious implementation efforts should aim at good algorithmic performance on realistic size applications. The crucial algorithms in formula and proof manipulation are matching, unification, substitu- tion and grafting. First-order unification has been speciMly well studied. A linear algorithm is known [122], but in practice quasi-linear algorithms based on congruence classes operations are preferred [99,100]. Furthermore, these algorithms extend without modification to unification of infinite rational terms represented by finite graphs [64]. Implementation methods may be partitioned into two families. Some depend on logical prop- erties (e.g. sharing subterms in dags arising from substitution to a term containing several occur- rences of the same variable). Some are purely statistical (e.g. sharing structures globally through hash-coding techniques). Particular applications require a careful analysis of the optimal trade-off between logical and statistical techniques. There is no comprehensive survey on implementation issues. Some partial aspects are described in [8,140,101,99,163,158,115,40,1,32,42,19,45,144,159].

2 Inference rules

We shall now study inference systems, defined by inference rules. The general form of an hlference rule is: IR P1 P2 ... P, q where the Pi's and Q are propositions belonging to some formal language. We shall here regard these propositions as types, and the inference rule as the description of the signature of IR considered as a typed operator. More precisely, IR has arity n, P~ is the type of its f-th argument, and Q is the 44 type of its result. Well-typed terms composed of inference operators are called the proofs defined by the inference system. Let us now examine a few familiar inference systems.

2.1 The trivial homogeneous case: Arities A graded alphabet ~ may be considered as the simplest inference systems, where types are reduced to arities. I.e., the set of propositions is 1, and an operator F of arity n is an inference rule

00...0 F: 0 (with n zero's in the numerator). A E-proof corresponds to our E-terms above.

2.2 Finite systems of types: Sorts The next level of inference systems consist in choosing a finite set S of elementary propositions, usually called sorts. For instance, with S = {int, bool}, and E defined by:

O: int S: int --~ int true: bool false: bool if: bool, int -+ int where we use the alternative syntax Pt, ..., P,~ -* Q for an inference rule, the term if(true, O, S(O)) is of sort int, i.e. it is a proof of proposition int. As another example, consider the puzzle "Missionaries and Cannibals". We call configuration any triple (b, m, c 1 E 2 × 4 × 4. The boolean b indicates the position of the boat, m (resp. c) is the number of missionaries (resp. cannibals) on the left bank. The set of states S is the set of legad configurations, that obey the condition

P(m,c) ==- m = c or m = O or m = 3

There are thus 10 distinct states or sorts. The rules of inference comprise first a constant denoting the starting configuration: ~o: then the transitions carrying p missionaries and q cannibals from left to right:

Lm,~,p,q : (O,m,c) -.* (1, m-p,c-q) (m > p,c > q,P(m,c),P(m-p,c-q),l < p+q <_ 2) and finally the transitions Rm,c,p,q, which are inverses of L~,c,p,q. The game consists in finding a proof of (t, 0, 0). This simple example of a finite group of transformations applies to more complex tasks, such as Rubik's cube. All state transition systems can be described in a similar fashion. Examples of such proofs are parse-trees of regular grammars, where the inference rules signatures correspond to a finite automaton transition graph. Slightly more complicated formalisms allow subsorts, i.e. containment relationships between the sorts, i.e. implications between the elementary proposi- tions. These systems reduce to simple sorts by considering dummy transitions corresponding to the implicit coercions. 45

2.3 Types as terms: standard proof trees We shall here describe our types as terms formed over an Mphabct ~I> of type operators, which we shall call functors. For the moment, we shall assume that we have just one category of such propositions, i.e. the functors have just an arity. The alphabet ~ of inference rules determines the legal proof trees.

Example. Combinatory logic. We take as functors a set ¢ of constants ¢0, plus a binary operator --% which we shall write in infix notation. We call functionality a term in T(¢). We have three families of rules in ~. In the following, the meta-variables A, B, C denote arbitrary functionalities. The operators of the K and S families are of arity 0, the operators of the App family are binary.

KA,B : A ~ (B--* A)

SA,I3,C : (A ~ (B ~ C)) --* ((A ~ B) --+ (A --+ C)) A---*B A AppA,B : B Here is an example of a proof. Let A and B be 'any functionalities, C = B --+ A, D = A --~ C, E = A--*A,F = A~(C~A),G = D--*E. The term

App D,.~ ( App aa ( S A,C,A, -~A,C ), K A,B ) has type E, i.e. it gives a proof of the proposition A ~ A. We express formally that proof M proves proposition P in the inference system ~ as: ~ ~ M : P. That is, we think of a theorem as the type of its proof tree. Proof-checking is identified with type-checking. Here this is a simple consistency check; that is, if operator F is declared in ~ as: F: P1 .... ,Pn --* Q and if ~ ~ M~ : P~ for 1 < i < n, then ~ ~ F(MI,...,M,) : Q.

2.4 Polymorphism: Rule schemas This next level of generality consists in authorizing variables in the propositional terms. This is very natural, since it internalizes the meta-variables used to index families of inference rules as propositional variables. The ~ules of inference become thus polymorphic operators, whose types are expressions containing free variables. This is the traditional notion of schematic inference rule from mathematical logic.

Example. The example from the previous section is more naturally expressed in this poIy- morphic formalism. We replace the set ¢0 by a set of variables V, and now we have just 3 rules of inference: K, S and App. The types can be completely dispensed with, since a well typed term possesses~ a most general type, called its principal type. For instance, in the example above, the proof App(App(S,K),K) has a principal type A ~ A, with A E V. This term is usually writ- ten I -- SKK in combinatory logic, where the concrete syntax convention is to write combinator strings to represent sequences of applications associated to the left. The notion of principM type, first discovered by Hindley in the cornbinatory logic context, and independently by Milner for HL type-checking [111], is actually completely general: 46

Theorem. Let E be any signatur e of polymorphic operators over a functor signature ~. Let M be a legal proof term. Then M possesses a principal type r E T(q~, V). That is, ~ ~ M : r, and for all r * e T(¢, V), E ~ M : r ~ implies r _< rq Proof. This is an easy application of the unification theorem. By now we have developped enough formalism to make sense out of our "types as propositions" paradigm. Actually, the example we have developped above is the fragment of propositional logic known as ~minimal logic". When regarding the functor -~ as (intuitionistic) implication, and App as the usual inference rule of Modus ponens, K and S are the two axioms of minimal logic presented as a Hilbert calculus. Combinatory logic is thus the calculus of proofs in minimal logic [37]. Actually combinators don't just have a type, they have a value~ They can be defined with definition equations in terms of application. Using the concrete syntax mentioned above, we get for instance K and S defined by the following equations:

DefK : K x y = x

1)eIs : 8 x y z = ~ z (y z).

Exereice. Verify that the two equations above, when seen as unification constraints, define the expected principal types for K and S. This point of view of considering equality axiomatizations of the proof structures corresponds to what the proof-theorists call cut elimination. That is, the two equations above can be used as rewrite rules in order to eliminate redundancies corresponding to useless detours in the proofs. We shall develop more completely this point of view of computation as proof normalizatlon in section 4.4 below. The current formalism of inference rules typed by terms with variables corresponds to proof the- ory's iatuitionistic sequents, and to automated reasoning's Horn clauses. For instance, a PROLOG [24] interpreter may be seen in this framework as a proof synthesis method. Given an alphabet E of polymorphic inference rules (usually called definite clauses), and a proposition r over functor alphabet ~, it returns a proof term M such that M is a legal E-proof term of principal type r t instance of r: E~M:rf_>r. With a = rt/r, we say that a is a PROLOG answer to the query r. Of course this explanation is incomplete; we have to explain that PROLOG finds all such instances by a backtrack procedure constructing proofs in a bottom-l~p left-to-right fashion, using operators from E in a specific order (the order in which clauses are deelm'ed); this last requirement leads to incompleteness, since PROLOG may loop with recursively composable operators, whereas a different order might lead to termination of the procedure. Also, PROLOG may be presented several goals together, and they may share certain variables, but this may be explained by a simple extension of the above proof-synthesis explanation. We claim that this explanation of PROLOG is more faithful to reality than the usual one with Horn clauses. In particular, our explanation is completely constructive, and we do not have to explain the processes of conjunctive normalization and Skolemization. N~rthermore, there is no distinction in ¢ between predicate and function symbols, consistently with most PROLOG implementations. 47

2.5 Proof terms with variables, natural deduction. The example above demonstrated the difficulty of proofs presented in a Hilbert style. The com- pletely trivial theorem VA.A ~ A had a complicated proof using three axioms and two applications of modus ponens. Of course one could consider adding combinator I as an axiom, but this is only begging the question since other trivial natural theorems would present similar difficulties. And of course there is no e~sy way to decide which combinators are well-typed. For instance, Peirce's law:

Peiree : ((A ~ B) -* A) -~ A although a propositional tautology easily checkable by the truth-table method, is not intuitionisti- cally valid. The natural proof of A --* A consists in, given a proof x of A, returning merely x as a proof of A. that is, the natural proof of A -~ A is the (polymorphic) identity algorithm. This method of proof usually procecds through the deduction theorem below. Deduction theorem. Let P be any set of propositions, A and B be two propositions. We have P,A l- B iffP [- A -* B. The deduction theorem holds in any reasonable system of logic. It can be proved easily in minimal logic, by induction on the size of proofs. Unfortunately, the deduction theorem is a meta theorem, i.e. a mathematical theorem of the meta-theory analyzing the proof system, as opposed to the theorems, or well-typed proof terms inside the proof system. We shall see in section 4 that it is easy to internalize deductions as proof terms with variables, called sequents. This point of view will lead to logic presented in natural deduction style, that is to A-calculus formalisms. Before investigating this next level of expressive power, we consider in the next section a particularly important inference system ~, that of equational logic.

3 Rewriting inference and equational logic 3.1 The classical presentation Equational logic is classically presented as a restriction of first-order logic, where the only predicate symbol is =, and the only non-logical axioms are universal equalities between terms containing free variables. For instance, the theory of groups is classically presented over the functor alphabet

= (*,-1,1} by the equations: Idl: l*x = x

invl: = 1 A88: (x*y)*z = z*(y,z) and the class of all first ~)rder models of these equations is called the variety of groups. The well known completeness theorem of Birkhoff states that a universally quantified equation between terms over ~ is valid in the variety iff it can be deduced from the axioms using the rotes of substitution and of replacement of equal for equal. 48

3.2 The proof-theoretlc formalization Here we ignore the abstract notion of model and concentrate on the rules of inference. We assume given a fum:tor alphabet ¢ given with arity function a, in which we distinguish an'atom --* given with arity 2. The substitution inference rule disappears, since it is implicit from the polymorphism of other rules. The replacement of equals for equals is decomposed into elementary steps of term replacement rules: IdA : A --* A Reflexivity A-*B B--,C ; : A ~ C Transitivity which specify that the rewriting arrow ~ is a quasi-ordering. Now we must state that --* is compatible with the rest of the C-structure. That is, for every functor F in • - {--*} of drily n and for every i < n we take a congruence rule: A-, B Funeral :' F(A1,...,Ai-I,A, AI+I,...,An) --* F(A1,...,Ai-I,B, Ai+I,...,An) Congruence

If we add the rule of symmetry we get the theory of equMity, where we usually use symbol = instead of -+: A=B Op : • B = A Symmetry The non-logical axioms of the variety are then added as so many constants. For instance, over groups, we obtain a proof of proposition y = z -1 * (x * y) by the term

Op(F~,~a,,~(z~l); Sat); Ass : y = ~-~, (~ • y)

Exercise: Show a proof of x * 1 = x using the inference system ~ above.

The conclusion we may draw from the example above is that, beyond its apparent simplicity, equational reasoning may indeed be quite complicated. The rule of symmetry is specially hard to use since it expresses a commutativity of ---~, harder to visualize than the easier mono~d structure implicit from the rules Id and ";'. It is then natural to ask: 1) Can we eliminate Op 2) More generally, can we normalize equational proofs?

3.3 The categorical viewpoint This viewpoint gives a prominent ro!e to the mono~d structure of the quasl-ordering --~. Simplifying the presentation, we may present a category as presented by a set of objects Obj, which we shall here confuse with the set of (closed) terms over some functor alphabet ¢, and by a set of arrows (or morphisms) which we shall here confuse with the set of (closed) proofs generated from some inference system ~ containing initially the two rules;

IdA : A --* A Identity A~B B--~C ; : A ~ C Composition

Whenever f : A ~ B, we say that arrow f has domain A and codomain B. Furthermore, it is specified that the proofs are quotiented by a congruence = verifying:

Idl: Id;f = f 49

Idr: f;Id = f A~: (/;9);h = /; (9;h) So we see that a category is a structure obtained as a hybrid of quasi-ordering and of mondid, to which it reduces in the two degenerate cases (i.e. If : A --* B t < 1 and IObjI = I). Note that we have given the same name to axiom Idt as for the axiomatization of groups above, Mthough here the operator ";" is a E-operator, and not just a C-operator like "*". However the unification theorem allows us to make implicit the type of variable f above, and the overloading of "Idl" may be seen as a reflection pl:inciple. If A and B are two categories, a functor F from A to B associates to every object A of A an object F(A) of B, and to every arrow f: A -4 S an arrow f(f) : F(A) -~ F(B) such that the following functorial conditions hold: F(Id) = Id

F(f;g) = F(f);F(g) We see a great analogy between the notion of rewriting inference system and the main categorical notions. Actually, the categorical viewpoint is richer in that the functors have sorts themselves (i.e., the categories), and poorer in that they do not yet have arities (i.e. we just have monadic functors so far). In order to build-in arities we shall need products, and a full categorical account of minimal logic is obtained by a further adjunction, namely exponentiation. But we shall defer this explanation until we develop natural deduction in section 4. We have given this elementary development of category theory essentially to justify our terminology. The congruence rule of term formation explains a functoriality condition on the object part, and the functoriaJ~ty condition on the arrow part of the functor expresses the congruence property for rewriting. Substitutivity in rewrite rules is expressed by defining them as natural transformations between the functors denoted by the two sides of the rule. That is, a natural trans#orma~ion r between functors F and G (both from category A to category B) is a mapping associating to every object A of A an arrow rA : F(A) -+ G(A) such that rA;G(f) ---- F(f);rB And if we consider equations rather than simply rewrite rules, the symmetry inference rule is inter- preted as the existence of inverses to arrows. Equations are thus defined as natural isomorphisms. Category theory is explained in Mac Lane [94]. The categorical viewpoint for algebra has been developped by Lawvere and others [97]. Its application to proof theory is explained (in a somewhat complicated form) in Szabo [151].

3.4 Confluence and Termination We come back to the problem of eliminating the symmetry rule. Let now --+ be any binary re]atlon over some set S, --~* be its reflexive-transitive closure, ~-~* be its equivalence closure. We say that --* verifies tile Church-Rosser condition iff

X +'+* y ¢~e 3Z Z -'-+* Z A y "~ * Z

It is easy to show that this condition is equivalent to confluence, i.e.

u-+*x A U'-+* y ~-? 3Z X"'~*Z A y"**Z that is, diagrammatically: 50

U x \/ y Z

When -~ is a confluent relation, normal forms (i.e. terminal elements) are unique whenever they exist, and equality (i.e. +-+*) may be decided by rewriting: That is, deduction may be replaced by computation~ and symmetry is eliminated in all but one instance. For instance~ the following set of 10 rewrite rules defines a confluent term rewriting system for group theory'. 1*~ -'+ :g X--I *x "-+ 1

x*l -+ x x,x. -1 -* 1

(x-l) -I "-'* X

1-1-+1 ~,(x-l,y) ~ y ~-l , (x , y) .--, y (x * y)-I _~ y-1, x-1

The rewrite relation associated with such a term rewriting system/~ is defined by M --~R N iff there exists a rule a --~ fl in R and an occurrence u in D(M) such that M/u = a(c~) for some substitution a, and iV = M[u ~-- a(fl)]. It is clear that the group axioms are decided by the system R above. Conversely all rewrite rules in R may be shown to be valid equations in group theory (see the exercise above). What is less obvious is to decide the confluence of R. We shall see in the next section that is is easy to show that it is 1ocally confluent, in the sense that:

u-+ x A u-'+ y "¢~ 3z x-'+ * z A y--~ * z that is, diagrammatically: /\ U x \/ Y Z 51

However, local confluence is not enough to prove confluence, as shown by the following counter- examples:

3.5 The Ncetherian case: Knuth and Bendix The problem encountered with the above counter-examples is that the rewriting relation possessed infinite chains. Let us say that relation ---r is Ncetherian iff there is no infinite chain Xl ~ x2 --~ ... (Then, its transitive closure --*+ is a well-founded ordering). We remark that ~ is Ncetherian over S iff every non-empty subset of S admits a minimal element with respect to --*+. Now lct us say that a predicate P over S is --*-hereditary iff

VxeS [Vy x--~+y :=~ P(y)] =~ P(x).

Now we may state an important induction principle.

Principle of Noetherian Induction: Let ~ be a Ncetherian relation over 8. Then for every --*-hereditary predicate P we have Vx E S P(z).

It is easy to validate this induction principle using the above remark, by considering the set of all x's such that not P(z). And now we may show that local confluence implies confluence for Ncetherian relations.

Newman's lamina. A Ncetherian relation is confluent iff it is locally confluent.

Proof: Ncctherian induction on predidate P defined as:

P(u) = Vz, y u--o* z A u~*y =~ 3z z--+*z A y~*z

We now explain the Knuth-Bendix decision procedure for the confluence of Ncetherian term rewriting systems. First let us give an algorithm.

Superposition algorithm. Let al -+ fll and a2 ~ f12 be two rewrite rules in R, let u E D(al) and M = a~/u be such that M is a non-variable term unifiable with as. Let iV = al (M) --- a2(a2) be a principal instance, with V (N)NV (al) = ~. We say that the ~uperposi- tion of a2 --~ & on ~1 --* fli at u determines the critical pair (P, Q), with P --" al (al) [u <-- a2(f12)] oald Q = al(/~l). Examples 52

• G(B,x) --~ K(x) superposes on F(x,G(x,A)) --~ H(x) at '2' to give P = F(B,K(A)) and Q = H(B). • H(H(x)) ---* If(x) superposes on itself at '1' to give P = H(g(y)) and Q = K(H(y)).

The Knuth-Bendlx theorem. The relation -~/~ is locally confluent iff for every critical pair

The above theorem m~y be used to check the local confluence of the 10 group rewrite rules above. Assuming termination, this shows that any group equality may be decided by mere rewriting. ActuMly, the method may be extended to systems of rules that fail the test. tf for some critical pair (P: Q> we reduce P and Q to two distinct irreducible terms P' and Q', we have generated an interesting lemma P' = Q'~ which is an equational consequence of the rnles considered as equations. It may be possible to give an orientation to this new equality for forming an extended term rewriting system, while preserving the finite termination property. This is the basis of the Knuth-Bendix completion method, which attempts to complete a term rewriting system to a confluent one. This method may be considered a way of compiling a canonical form algorithm from an equational specification. We cannot describe the method fully here. The main ideas are that unresolved critical pairs are kept as new rewrite rules, and that all rules are kept inter-reduced. The procedure may stop with a canonical system, it may fail because termination is impossible to establish, or it may loop. Whenever it does not fail, it gives a semi-decision procedure for the original equational theory, as explained in Huet [66]. More detailed expositions of the method may be found in [84,65,71]. Failure may resnlt from some permutative consequence such as eommutativity. The method has been extended in varions ways in order to consider rewritings modulo such permutative axioms. For instance, Peterson and Stickel [127] have shown that it was possible to extend the method to complete equational presentations, where oneor several functors were assumed to be associative and commutative, using Stickel's associative-commutative unification algorithm [150,43]. This method has been extended by Jouannaud and Kirchner [73]. Various other extensions of the Knuth-Bendix procedure have been proposed, for handling con- structors (free functors) [69] and for solving word problems in finitely presented algebras [90]. The Knuth-Bendix completion procedure and its extensions give a general framework to simplification techniques. As example of canonical term rewriting system we give distributive lattices. Here n and U are assumed to be associative and commutative. The canonical set consists in the following four rules:

xn(zuy) ~x

• u(yn~) ~[~uy) n(~uz) xU:~ ---~x

xn= -~z Exercise. Show that the other distributivity law is a consequence of the above rules. Finally, we show the canonical system for Boolean algebras. Now the connectives h and $ (exclusive or) are assumed to be associative and commutative.

xA1 --+x 53

xAO -~0

xAx "-~ x x@0-*z z @ x --~0

This canonical set can be used to decide propositional calculus, using the following translations:

xVy -*x $ y • (xAy) x=~y ~x $ (xAy) $ 1 The resulting decision method is basically the method of Venn's diagrams, as the following example demonstrates. With three propositional letters a, b and c~ the proposition

(aA-~b) V (bA-~c) V (cA-~a) reduces to its canonical form:

a $ b @ c @ aAb @ bAc ~9 cAa which can easily be "seen" as a disjoint union of regions in the following Venn diagram:

This example also shows that disjunctive normal form is not a canonical form, since the above proposition possesses another d.n.f.

or, as Quine puts it, a formula may have distinct mlnimal sets of prime impticants. 54

3.6 Sequential computations We now consider term rewriting systems with two constraints: (a) left linearity: for every ~ -*/~ in R, every variable of ~ occurs exactly once (b) non ambiguity: there are no critical pairs As we shall see, these systems are always confluent, their termination is unnecessary. Functional programming languages, and more generally operational semantics rules can usually be expressed as such systems of rewrite rules [58]. As a very simple example, consider the system of two rules DefK and Defs defining the combinators S and K. We shall here define the main notions of computation using rewrite rules. The full theory is given in Huet-L~vy [70]. We call redex in term M an occurrence u e D(M) such that a _< M/u for some left-hand side a of a rule in R. We define the reduction relation -*R associated with R in the same way as in the preceding section. We shall assume R fixed from now on, and write simply -* for reduction. Let M ~ N at redex occurrence u E D(M), using rule a -* fl E R. Let now v be any redex in M. We define the set v\u of residuals of v as a set of redexes in N defined as follows. If v = u, v\u -- ~. If v < u or vlu , then v\u = {v}. Finally, if v > u, this means, by non-overlapping, that v is below some variable x of a. By linearity, x has a unique occurrence in c~, which we shall denote by x as well. That is, v = u " x ~ w for some w. Now let X be the set of occurrences of variable z in ft. We define u\u = {u ~ y ~ • [ y e X}. Thus redex v may have zero, one or more residuals in N. Intuitively, these residuals are the places where one must reduce in N in order to effect the computation step consisting in reducing at redex v in M. Actually, on the natural dag implementation all the occurrences of v\u denote the same shared node of the dag representing N. Symmetrically the same holds of u\v. And as expected we have a local confluence diagram, where the single steps u and v confluate using all the steps in v\u (resp. u\v). However, this is not sufficient, since we do not want to require --+ to be Ncetherian. However, it is easy to notice that all the redexes in v\u are mutually disjoint, and that any residual of some redex is always disjoint from any residual of some other disjoint redex. Thus it is natural to extend the reduction relation -~ to parallel reduction of a set of mutually disjoint redexes, a relation we shall write --~ . If M -~ N using set of redexes U, then for every set V of mutually disjoint redexes in M, we define the residuals of V by U as: V\U = {w E v\u I u E U A v E V}. And now we have a strong confluence property:

which extends easily to 1autti-steps derivations A and B, yielding: The parallel moves theorem. Let A and B be two co-initial derivations. Define A U B as A; B\A. Then A U B - B U As in the sense that these two derivations are co-final, a~ld preserve 55 residuals.

The categorical viewpoint. The category whose objects arc terms, and whose arrows from M to N are parallel derivations, quotiented by the equivalence -, admits pushouts.

Corollary. The reduction relation -+ has the Church-Rosser property. Beware! The lattice structure given by the parMlel moves theorem is on derivations~ and not on terms. For instance, if we consider the system R consisting solely of the rules I(x) ~ x and J(z) ~ x, the following'derivations diagram shows that the terms I(J(A)) and J(I(A)) do not possess a g.l.b.

t(J(I(A)))

I(J(A))\/\// J(I(A)) I(I(A)) +(A)\/ t(A) A

Note that this phenomenon may be traced to the existence of two non-equivalent derivations between I(I(A)) and I(A). This shows that the categorical viewpoint is the right one here: we need to talk in terms of arrows, just just relations between terms. The standardization theorem. It is always possible to compute in an outside-in manner.

We do not have the space here to explain in a rigorous manner what outside-in exactly means. We just remark that this may be more complicated than merely reducing the leftmost-outermost redex, i.e. the redex minimum in the total ordering on occurrences defined by u

4 Natural deduction and A-calculus

4.1 Proofs with variables; sequents We now come back to the general theory of proof structures. We saw earlier that the Hilbert presentation of minimal logic was not very natural, in that the trivial theorem A -~ A necessitated a complex proof S K K. The problem is that in practice one does not use just proof terms, but deductions of the form r~A where i" is a set of (hypothetic) propositions. Deductions are exactly proof terms wi~h varlables. Naming these hypothesis variables and the proof term, we write: {...[xi:Ai]...ti<_n} t- M:A with V(M) C {Xl, ..., x,~}. Such formulas are called sequents. Since this point of view is not very well-known, let us emphasize this constatation:

Sequents represent proof terms wi~h variables.

Note that so far our notion of proof construction has not changed: P b-~ M : A iff b-~ur M : A, i.e. the hypotheses from r are used as supplementary axioms, in the same way that in the very beginning we have defined T(~, V) as T(~ tA V).

4.2 The deduction theorem This theorem~ fundamental for doing proofs in practice, gives an equivalence between proof terms with variables and functional proof terms:

ru{A} ~- B ~ r ~ A~B

That is, in our notations: a) F b- M:A---+B ~ ru{z:A} ~- (Mx):B This direction is immediate, using App, i.e. Modus Ponens. b) rU{x:A} t- M:B ~ F t- [x]M:A--~B where the term [~] M is given by tile following algorithm. 8chSnfinkel's abstraction algorithm:

[x]x -- I (-- S g K)

[z](MN) = S [x]M [x]N 57

Note that this algorithm motivates the choice of combinators S an(] K (and optionally I). Again we stress a basic observation:

Sch6ntinkel's algorithm is the essence of the proof of the deduction theorem.

Now let us consider the rewriting system R defined by the rules DefK and Defs, optionally supplemented-by: Defi: I x = x and let us write ~> for the corresponding reduction relation.

Fact. ([x]M N) ~>* M[z~--N].

We leave the proof of this very important property to the reader. The important point is that the abstraction operation, together with the application operator and the reduction t>, define a substitution machinery. We shall now u.~e this idea more generally, in order to internalize the deduction theorem in a basic calculus of flmctionality. That is, we forget the specific combinators S and K, in favor of abstraction seen now as a new ~erm constructor.

4.3 A calculus. Here we give up T-terms in general, in favor of A-terms constructed by 3 elementary operations:

variable

(M N) application [z] M abstraction This last case is usually written Ax. M, whence the name A-notation. The A-notation is first a non-ambiguous notation for expressions denoting functions. For instance, the function of two arguments which computes sin of its first argument and adds it to cos of its second is written

Ez] [y]sin(z)+cos(y)

The variables x and y are bound variables, that is they are dummies and their name does not matter, as long as there are no clashes. This defines a congruence of renaming of bound vari- ables usually called a-conversion. Another method is to adopt de Bruijn's indexes, where variable names disappear in favor of positive natural numbers [15]. We define recursively the sets An of A-expressions valid in a context of length n > 0 as follows:

k (1 < k < n) [ (MN) (M, NeAn) I []M MEA,+I.

Thus integer n refers to the variable bound by the n-th abstraction above it. For instance, the expression [] (1 [] (1 2)) corresponds to [x] (x [y] (y x)). This example shows that, although more rigorous from a formal point of view, the de Bruijn naming scheme is not fit for human understanding, and we shall now come back to the more usual concrete notation with variable names. 58

Tim fact observed above is now edicted as a computation rule, usually called #-reduction. Let > be the smallest relation on A-expressions compatible with application and abstraction and such that: ([x]M N) > M[x +- g]. We call A-calculus the A-notation equipped with the fl-reduction computation rule >. )~-calculus is the basic calculus of substitution~ and fl-reduction is the basic computation mechanism of fimctional programming languages. Here is an example of computation:

>2 ([~] (~ u) [w] [~3 (~ u))

We briefly sketch here the syntactic properties of ,\-caiculus. Similarly to the theory developped above, the notion of residual can be defined. However, the residuals of a redex may not always be disjoint, and thus the theory of derivations is more complex. However the parallel moves lemma still holds, and thus the Church-Rosser property is also true. Finally, the standardization theorem holds, and here it means that it is possible to compute in a teftmost-outermost fashion. These results, and more details, in particular the precise conditions under which fl-reduction simulates combinatory logic calculus, are precisely stated in Barendregt [4]. We finally remark that A-calculus computations may not always terminate. For instazlce, with A = [u] (u u) and 3_ = (A A), we get 3_ > 3_ > ... A more interesting example is given by

Y = [f]([u](f (uu)) [u](f (uu))) since (Y f) >* (f (Y f)) shows that Y defines a general fixpoint operator. This shows that (full) A-cMculus is inconsistent with logic. What could (fix -,) mean? As usual with such paradoxical situations, it is necessary to introduce types in order to stratify the definable notions in a logically meaningful way. Thus, the basic inconsistency of Church's A-calculus, shown by Rosser, led to Church's theory of types [22]. On the other hand, A-calculus as a pure computation mechanism is perfectly meaningful, and Strachey prompted Scott to develop the theory of reflexive domains as a model theory for full )~-calculus. But let us first investigate the typed universe.

4.4 Gentzen's system N of natural deduction The idea of A-notation proofs underlies Gentzen's natural deduction inference rules [48], where App is called -*-elim and Abs is called -~-intro. The role of variables is taken by the base sequents:

AxiomA : A ~- A

together with the structural ~1~nning rule: F~-B Thinning : ru{A} ~-B which expresses that a proof may not use all of the hypotheses. Gentzen's remaining rules give types to proofs according to propositions built as flmctor terms, each functor corresponding to a propositional connective. The main idea of his system is that inference nlles should not bc arbitrary, 59 but should follow the flmctor structure, in explaining in a unifl)rm fashion how to introduce a functor, and how to eliminate it. For instance, mininal logic is obtained with ~ = {-~}, and the rules of ~ -intro and ---* -elim, that is:

FU{A} b B Abs : F F" A-"+B P b A-~B A bA App : FUA F B Now, the fl-reduction of A-calculus corresponds to cut-elimination, i.e. to proof-simplification. Reducing a redex corresponds to eliminating a detour in the demonstration, using an intermediate lemma. But now we have termination of this normalization process, that is the relation ~ is Ncetherian on valid proofs. This result is usuMly called strong normalization in proof theory. A full account of this theory is given in Stenlund [149]. Minimal logic can then be extended by adding more functors and corresponding inference rules. For instance, conjunction A is taken into account by the intro rule:

PFA AbB Pair : PUA b AAB which, from the types point of view, may be considered as product formation, and by the two elim rules: FPAAB Fst : PbA FbAAB Snd : PFB corresponding to the two projection functions. This corresponds to building-in a ),-calculus with pairing. (~eneralizing the notion of redex (cut) to the configuration of a connective intro, immedi- ately followed by elim of the same connective, we get new computation rules:

F~t(P~ir(~,y)) ~ x

Snd(Pair(x,y)) ~> y and the Noetherian property of E> still holds. We shall not develop further Gentzen's system. We just remark: (a) More connectives, such as disjunction, can be added in a similar fashion. It is also possible to give rules for quantifiers, although we prefer to differ this topic until we consider dependenL bindings. (b) Gentzen originally considered natural deduction systems for meta-mathematical reasons, namely to prove thcir consistency. He considered ~mother presentation of sequent inference rules, the L system, which possesses the subformula property (i.e. the result type of every operator is formed of subterms of the argument types), and is thus trivially consistent. Strong normalization in this context was the essential technical tool to establish the equivalence of the L and the I'~ systems. Of course, according to Ghdel's theorem, this does not establish absolute consistency of the logic, but relativizes it to a carefully identified troublesome point, the proof of termination of some reduction relation. This has the additional advantage to provide a hierarchy of strength of inference systems, classified according to the ordinal necessary to consider for the termination proof. 60

(e) All this development concerns so called intuitJonistic logic, where operators (inference rules) arc deterministic. It is possible to generalize the inference systems to classical logic, using a generalized notion of sequent I" b- A, where the right part A is also a set of propositions. It is possible to explain the composition of such non-deterministic operators, which leads to Gentzen's systems NK and LK (Klassical logic!). Remark that the analogue of the unification theorem above gives then precisely Robinson's resolution principle for general clauses [139]. (d) The categorical viewpoint fits nicely these developments. This point of view is completely developped in Szabo [151]. The specially important connections between A-calculus, natural de- duction proofs and cartesian closed categories are investigated in [98,121,87,142,35,68]. Further readings on natural deduction proof theory are Prawitz [130] and Dummett [41]. The connection with recursion theory is developped in Kleene [82] and an algebraic treatment of these matters is given in Rasiowa-Sikorski [133].

4.5 Programming languages, recurslon The design of programming languages such as ALGOL 60 was greatly influenced by A-calculus. In 1966 Peter Landin wrote a landmark article setting the stage for coherent design of powerful functional languages in the A-calculus tradition [89]. The core language of his proposal, /SWIM (If you see what I mean!) meant A-calculus, with syntactically sugared versions of the fl-redex ([z]M N), namely let x = N in M and M where x = N respectively. His language followed the static binding rules of A-calculus. For instance, after the declarations:

let f x = x+ywherey=l;

let y = 2; the evaluation (reduction) of expression (f 1) leads to value 2, as expected. Note that in contrast laziguages such as LISP [107], although bearing some similarity with the A-notation, implement rather dynamic binding, which would result in the example above in the incorrect result 3. This discrepancy has led to heated debates which we want to avoid here, but we remark that static binding is generally considered safer and leads to more efficient implementations where compilation is consistent with interpretation. However, ISWIM is not completely faithful to A-calculus in one respect: its implementation does not follow the outside-in normal order of evaluation corresponding to the standardization theorem. Instead it follows the inside-out applicative order of evaluation demanding the arguments to be evaluated before a procedure is called. In the ALGOL terminology, ISWIM follows can by value instead of call by name. The development, of natural deduction as typed A-calculus fits the development of an ISWIM- based language with a type discipline. We shall call this language ML , which stands for "meta- language", in the spirit of LCF's ML [54,53]. For instance, we get a core ML0 by considering minimal logic, with ~ interpreted as functionality, and further constant functors added for basic types such as triv, boot, int and string. Adding products we get a language ML1 where types reflect an intuitionistic predicate calculus with ~ and A. We may define functions on a pattern argument formed by pairing, such as:

let fs~(x, y) = z and the categorical analogue are the so-called cartes/an closed categories (CCCs). Adding sums lead to Bi-CCC's with co-product. The corresponding I~L2 primitives are int, inr, out1, outr and is1, with obvious meaning. So far all computations terminate, since the corresponding reduction relations are Ncetherian. 61

However such a programming language is too weak for practical use, since recursion is missing. Adding recursion operators may be clone in a stratified manner, as presented in Gbdel's system T [51], or in a completely general way in ML~, where we allow a "letrec" construct permitting arbitrary recursive definitions, such as:

letrec fact n : if n = O then l else n * (/fact (n-l))

But then we loose the termination of computations, since it is possible to write un-founded definitions such as letrec absurd x = absurd x. Furthermore, because ML follows the applicative order of evaluation we may get looping computa- tions in cases where a A-calculus normal form exists, such as for

let f z = 0 in f (absurd x),

4.6 Polymorphism We have polymorphic operators (inference rules) at the meta level. It seems a good idea to push polymorphism to the object level, for functions defined by the user as A-expressions. To this en.d, we introduce bindings for type variables. This iclea of type quantification corresponds to allowing proposition quantifiers in our propositional logic. First we allow a universal quantifier in prenex position. That is, with To = T(~, V), we now introduce type schemas in 7'1 = ToUVa T1, a E V. A (type) term in 7'1 has thus both free and bound variables, and we write FV(M) and BV(M) for the sets of free (respectively bound) variables.

We now define generic instaneiation. Let r = Vai...am.roE TI andr s = Vfll...fln'r~ ET1. We definer s_>c riffv~ = a(v0) with D(a) C_ {al,...,am} and fll ~ FV(T) (1 < i _< n). Remark that _ acts on FV whereas _>(_; acts on BV. Also note ~" >_a ~" :* ~ff') >a ~(r) We now present the Damas-Milner inference system for polymorphic A-calculus [39]. In what follows, a sequent hypothesis A is assumed to be a list of specifications ~i : ri, with r i E T1, and we write FV(A) = Ui FV(~).

TAUT : A I" x : t~ (~:aEA)

AFM:a INST : A FM : # a<_c#)

A~-M:r GEN : A ~-M : Va.r (a~FV(A) A t-M : P--+T A ~-N : r l APP : A F(MN) : r AU{x:r'} FM : r ABS : A ~- [x]M : r I--~r A FM : P AU{x:r I} ~'N : r LET : A Fletx=MinN : r 62

For instance, it is all easy exercise to show that

~- let i = [x]xin (it) : c~ + c~.

The above system may be extended without difficulty by other flmctors such as product, and by other I~L contructions such as letrec. Actually every ML compiler contains a typechecker imple- menting implicitly tile above inference system. For instance, with the unary functor list and the following ML primitives: [] : (list a), cons: a × (list a) (written infix as a dot), hd : (list a) -~ a and tl : (list a) ---* (list o~), we may define recursively the map functional as:

letree map f I = if t = [3 the. [] else (f (hd I)) • map f (tl l) and we get as its type: ~- map: (a ~ fl) --~ (list a) --* (list fl). Of course the I~IL compiler does not implement directly the inference system above, which is non-deterministic because of rules INST and GEN. It uses unification instead, and thus computes deterministieally a principal type, which is minimum with respect to -

4.7 The limits of ML ~s polymorphlsm Consider the following ML definition:

letree power n f u = i f n = O then u

else f (po~e~ (~ - 1) f ~) 63 of type nat ~ (a ~ a) ~ (a ~ c~). This ['unction, which associates to natural n the polymorphic iterator mapping function f to the n-th power of f, may be considered a coercion operator between ML ;s internal naturals and Church's representation of naturals in pure A-calculus [23]. Let us recall briefly this representation. Integer 0 is reprcsented as the projection term [f] [u] u. Integer 1 is [f] [u] (f u). More generally; n is represented as the functional ~ iterating a function f to its n-th power: -----[f] [u] (f (f ...(fu)...)) and the arithmetlc operators may be coded respectively as: n+m -- [f][u](n f (m fu)) n×m --- [f](n (mf))

For instance; with 2 -- [f] [u] (f (f u)), we check that ~ x ~ converts to its normal form il. We would like to consider a type NAT = Va~ (a -+ a) ~ (a ~ a) and be able to type the operations above as functions of type NAT --+ NAT -+ NAT. However the notion of polymorphism found in ML does not support such a type, it allows only the weaker w. ((~ -, ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ -~ ~)) -~ ((~ -~ ~) -~ (~ -, ~)) which is inadequate, since it forces the same generic instanciation of NAT in the two arguments.

Warning. These preliminary notes are very sketchy from now on. A future version will cover the topics below in greater depth.

4.8 Girard's second order A-calculus.

The example above suggests using the universal type quantifier inside type formulas. We thus consider a functor alphabet based on one binary --* constructor and one quantifier V. We shall now consider a A-calculus with such types, which we shall call second-order A-calculus, owing to the fact that the type language is now a second-order propositional logic, with propositional variables explicitly quantified. Such a calculus was proposed by J.Y. Girard [49,50], and independently discovered by J. Reynolds [135]. Girard proved the main properties of the calculus:

Girard's theorem. Second-ol-der A-calculus admits strong normalization. Corollary. Second-order natural deduction is consistent.

Girard used this last resu!t to show the consistency of analysis.

Second-order A-calculus is a very powerfnl language. Most usual data structures may be rep- resented as types. Iaarthermore, it captures a large class of total reeursive functions (precisely, all the functions prouvably total in second-order arithmetic). It may seriously be considered as a candidate for the foundations of powerfill programming languages, where recursion is replaced by iteration. But the price we pay by extending polymorphism in this drastic fashion is that the notion of principal type is lost. Type synthesis is possible only in easy cases; and thus in general the programmer has to specify the types of its data. Further discussions on the second-order A-calculus may be found in [1(}8,46,91;7]. 64

5 Dependent types

5.1 Quantification So far we have dealt only with types as propositions of some (intuitionistic) propositional logic. We shall now consider stronger logics, where it is possible to have statements depending upon variables that are A-bffund. We shall continue our identification of propositions and types, and thus consider a first-order statement such as Vx E E- P(z) as a product-forming type H~eEP(z )- We shall call such types depende_~t, in ttmt it is now possible to declare a variable of a type which depends on the binding of some previously bound variable. Let us first of all remark that such types are absolutely necessary for practical programming purposes. For instance, a matrix manipulation procedure should have a declaration prefix of the type:

[n : nat] [matri~ : array(n)] where the second type depends on the dimension parameter. PASCAL programmers know that the lack of such declarations in the language is a serious hindrance. We shall not develop first-order notions here: and shall rather jump directly to calculi based on higher-order logic.

5.2 Martin-LSf~s Intuitionistic Theory of Types P. Martin-LSf has been developing for the last 10 years a higher-order intuitionist logic based on a theory of types, allowing dependent sums and products [104,105,106]. His theory is not explicitly based on ;~-calculus, but it is formulated in the spirit of natural deduction, with introduction mid elimination rules for the various type constructors. Consistency is infcrred from semantic considerations, with a model theory giving an analysis of the normal forms of elements of a type, and of the equality predicate for each type. Martin-LSf's system has been advocated as a good candidate for the description and validation of computer programs, and is an active topic of research by the GSteborg Programming Method- ology group [117,119,120]. A particularly ambitious implementation of Martin-LSf's system and extensions is under way at Cornell University, under the direction of R. Constable [25,26,132].

5.3 de Bruijn's AUTOMATH languages The mathematical language AUTOMATH has been developed and implemented by the Eindhoven group, under the direction of prof~ N.G. de l~ru~jn [14,t6~18]. AUTOMATH is a h-calculus with types that are themselves ;~-expressions: It is based on the natural idea that .~-binding and universal instaaciation are similar substitution operations. Thus in AUTOMATH there is only one binding operation, used both for parameter abstraction and product instanciation. The meta-theory of the various languages of the AUTOMATH family are investigated in [113,38,75]. The most notable success of the AUTOMATH effort has been the translation and mechanical validation of Landau's Grundlagen [74].

5.4 A Calculus of Constructions. AUTOMATH established the correct linguistic foundations for higher-order natural deduction. Unfortunately, it did not allow Girard's second-order types, and probably for this reason was never considered under the programming language aspect. Th. Coquand showed that a slight extension of 65 the notation allowed the incorporation of Girard's types to AUTOMATH in a natural m~umer [27]. Coquand showed by a strong normalization thcorem that the formalism is consistent. Experiments with ,~1 implcmentation of the calculus showed that it is welt adapted to expressing naturally and concisely mathematical proofs and computer algorithms [29]. Variations on this calculus are under development [30,31].

Conclusion

We have presented in these notes a uniform account of logic and computation theory, based on proof theory notions, and most importantly on the Curry-Howard isomorphism between propositions and types [37,59]. These notes are based on a course given at the Advanced School of Artificial Intelligence, Vigneu, France, in July 1985. An extended version is in preparation.

References

[1] A. Aho, J. Hopcroft, J. Ullman. ~'~TheDesign and Analysis of Computer Algorithms." Addison-Wesley (1974). [2] P. B. Andrews. "Resolution in Type Theory." Journal of Symbolic Logic 36~3 (1971), 414- 432. [3] P. B. Andrews, D. A. Miller, E. L. Cohen, F. Pfenning. ~Automating higher-order logic." Dept of Math, University Carnegie-Mellon, (Jan. 1983). [4] It. Barendregt. "The Lambda-Calculus: Its Syntax and Semantics ." North-Holland (1980). [5] E. Bishop. "Foundations of Constructive Analysis." McGraw-Hill, New-York (1967). [6] E. Bishop. "Mathematics as a numerical language." Intuitionism and Proof Theory, Eds. J. Myhill, A.Kino and R.E.Vesley, North-Holland, Amsterdam, (1970) 53-71. [7] C. BShm, A. Berarducci. "Automatic Synthesis of Typed Lambda-Programs on Term Alge- bras." Unpublished manuscript, (June 1984). [8] R.S. Boyer, J Moore. "The sharing of structure in theorem proving programs." Machine Iatelligence 7 (1972) Edinburgh U. Press, 101-116. [9] R. Boycr, J Moore. "A Lemma Driven Automatic Theorem Prover for Recursive Function Theory." 5th International Joint Conference on Artificial Intelligence, (1977) 511-519. [I0] R. Boyer, J Moore. "A Computational Logic." Academic Press (1979). [11] R. Boyer, J Moore. "A mechanical proof of the unsolvability of the halting problem." Report ICSCA-CMP-28, Institute for Computing Science, University of Texas at Austin (July 1982). [12] R. Boyer, J Moore. "Proof Checking the RSA Public Key Encryption Algorithm." Report ICSCA-CMP-33, Institute for Computing Science, University of Texas at Austin (Sept. 1982). [13] R. Boyer, J Moore. "Proof checking theorem proving and program verification." Report ICSCA-CMP-35, institute for Computing Science, University of Texas at Austin (Jan. 1083). 66

[t41 N.G. de Bruijn. "The matlmmatical language AUTOMATH, its usage and some of its ex- tensions." Symposium on Aut.omatic Demonstration, IRIA, Versailles, 1968. Printed as Springer-Vcrlag Lecture Notes in Mathematics 125, (1970) 29-61. [15] N.G. de Bruijn. "Lambda-Calculus Notation with Nameless Dummies, a Tool for Automatic Formula Manipulation, with Application to the Church-Rosser Theorem." Indag. Math. 34,5 (1972), 381-392. [16] N.G. de Bruijn. ¢'Automath a language for mathematics." Les Presses de l'Universit~ de Montreal, (1973). [17] N.G. de Bruijn. "Some extensions of Automath: the AUT-4 family." Internal Automath memo M10 (Jan. 1974). [18] N.G. de Bruijn. "A survey of the project Automath." (i980) in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindle$ J. R., Academic Press (1980). [19] M. Bruynooghe. "The Memory Management of PROLOG implementations." Logic Program- ming Workshop. Ed. Tarnlund S.A (July 1980). [20] L. Cardelli. "ML under UNIX." Bell Laboratories, Murray Hill, New Jersey (1982). [21] L. Cardelli. "Amber." Bell Laboratories, Murray Hill, New Jersey (1985). [22] A. Church: "A formulation of the simple theory of types." Journal of Symbolic Logic 5,1 (1940) 56-68. [23] A. Church. "The Calculi of Lambda-Conversion." Princeton U. Press, Princeton N.J. (1941). [24] A. Colmerauer, H. Kanoui, R. Pasero, Ph. Ronssel. "Un syst~me de communication homme- machine en francais." Rapport de recherche, Groupe Intelligence Artificielle, Facult~ des Sciences de Luminy, Marseille (1973). [25] R.L. Constable, J.L. Bates. "Proofs as Programs." Dept. of Computer Science, Cornell University. (Feb. 1983). [26] R.L. Constable, J.L. Bates. "The Nearly Ultimate Pearl." Dept. of Computer Science, Cornell University. (Dec. 1983). [27] Th. Coquand. "Une th~.orie des constructions." Th~se de troisi~me cycle, Universlt~ Paris vn (Jan. 85), [28] Th. Coquand, G. Huet. "A Theory of Constructions." Preliminary version, presented at the International Symposium on Semantics of Data Types, Sophia-Antipolis (June 84). [29] Th. Coquand, G..Huet. "Constructions: A Higher Order Proof System for Mechanizing Mathematics." EUROCAL85, Linz, Springer-Verlag LNCS 203 (1985). [30] Th. Coquand~ G. Huet. "Concepts Mathhmatiques et Informatiques Formalls4s dans le Calcul des Constructions." Colloque de Logique, Orsay (auil. 1985). [31] Th. Coquand, G. Huet. "A Calculus of Constructions." To appear, JCSS (1986). 67

[32] J. Corbin, M. Bidoit. "A Rehabilitation of Robinson's Unification Algorithm." IFIP 83, Elsevier Science (1983) 909-914. [33] G. Cousinean, P.L. Curien and M. Manny. "The Categorical Abstract Machine." In Func- tional Programming Languages and Computer Architccture, Ed. J. P. Jouannaud, Springer- Verlag LNCS 201 (1985) 50-64. [34] P.L. Curien. "Combinateurs cat6goriques, algorithmes s6quentiels et programmation applica- tive ." Th~se de Doctorat d'Etat, Universit6 Paris VII (Dec. 1983). [35] P. L. Curien. "Categorical Combinatory Logic." ICALP 85, Nafplion, Springer-Verlag LNCS 194 (1985). [36] P.L. Curien. "Categorical Combinators, Sequential Algorithms and Functional Program- ming." Pitman (1986). [37] H. B. Curry, R. Feys. "Combinatory Logic Vol. I." North-Holland, Amsterdam {1958). [38] D. Van Daalen. "The language theory of Automath." Ph.D. Dissertation, Technological Univ. Eindhoven (1980). [39] Luis Damas, Robin Milner. "Principal type-schemas for functional programs ." Edinburgh University (t982). [40] P.J. Downey, R. Sethi, R. Tarjan. "Variations on the common subexpression problem." JACM 27,4 (1980) 758-771. [41] Dummett. "Elements of Intuitionism." Clarendon Press, Oxford (1977). [42] F. Fages. "Formes canoniques dans les algtbres bool~ennes et application g la d6monstration automatique en logique de premier ordre ." Thtse de 3tree cycle, Univ. de Paris VI (Juin 1983). [43] F. Fages. "Associative-Commutative Unification." Submitted for publication (1985). [44] F. Fages, G. Huet. "Unification and Matching in Equational Theories." CAAP 83, l'Aquila, Italy. In Springer-Verlag LNCS 159 (1983). [45] P. Flajolet, J.M. Steyaert. "On the Analysis of Tree-Matching Algorithms." in Automata, Languages and Progfamnfing 7th hit. Coll., Lecture Notes in Computer Science 85 Springer Verlag (1980) 208-219. [46] S. Fortune, D. Leivant, M. O~Donnetl. "The Expressiveness of Simple and Second-Order Type Structures." Journal of the Assoc. for Comp. Mach., 39,1, (Jan. 1983) 151-185. [47] G. Frege. "Begriffschrift, a formula language, modeled upon that of arithmetic, for pure thought." (1879). Reprinted in From Frege to GSdel, J. van Heijenoort, Harvard University Press, 1967. [48] G. Gentzen. "The Collected Papers of Gerhard Gentzen." Ed. E. Szabo, North-Holland, Amsterdam (1969). 68

[49] J.Y. Girard. "Une extension de Finterprc~tation de GSdel "~ l'analyse, et son application l'~timination des coupures dans l'a~tatyse et la th6orie des types. Proceedings of the Second Scandinavian Logic Symposium, Ed. J.E. Fenstad, North Holland (1970) 63-92. [50] J.Y. Girard. "Interprgtation fonctionnelle et 61imination des coupures dans l'arithm~tique d'ordre sup6rieure.' Th~se d'Etat, Universit~ Paris VII (1972). [51] K. G6del. "Uber eine bisher noch nicht benutze Erweitrung des finiten Standpunktes." Di- alectica, 12 (1958).. [52] W. D. Goldfarb. "The Undecidability of the Second-order Unification Problem." Theoretical Computer Science, 13, (1981) 225-230. [53] M. Gordon, R. Milner, C. Wadsworth. "A Metalanguage for Interactive Proof in LCF." Internal Report CSR-16-77, Department of Computer Science, University of Edinburgh (Sept. 1977). [54] M. J. Gordon, A. J. Milner, C. P. Wadsworth. "Edinburgh LCF" Springer-Verlag LNCS 78 (1979). [55] W. E. Gould. "A Matching Procedure for Omega Order Logic." Scientific Report 1, AFCRL 66-781, contract AF19 (628)-3250 (1966). [56] J. Guard. "Automated Logic for Semi-Automated Mathematics." Scientific Report 1, AFCRL (1964). [57] J. Herbrand. "Recherches sur la th6orie de la d~monstration." Th~se, U. de Paris (1930). In: Ecrits Iogiques de J~cques Herbrand, PUF Paris (1968). [58] C. M. Hoffmann, M. J. O'Donnell. "Programming with Equations." ACM Transactions on Programming Languages and Systems, 4~1 (1982) 83-112. [59] W. A. Howard. "The formulm-as-types notion of construction." Unpublished manuscript (1969). Reprinted in to H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, Eds Seldin J. P. and Hindley J. R., Academic Press (1980). [60] G. Huet. "Constrained Resolution: a Complete Method for Type Theory." Ph.D. Thesis, Jennings Computing Center Report 1117, Case Western Reserve University (1972). [61] G. Huet. "A Mechanization of Type Theory." Proceedings, 3rd IJCAI, Stanford (Aug. 1973). [62] G. Huet. "The Undccidability of Unification in Third Order Logic." Information and Control 22 (1973) 257-267. [63] G. Huet. "A Unification Algorithm for Typed Lambda Calculus." Theoretical Computer Science, 1.1 (1975) 27--57. [64] G. Huct. "R~solution d'~quations dans des langages d'ordre 1,2, ... w." Th~se d'Etat, Universit~ Paris VII (1976). [65] G. Huet. "Confluent Reductions: Abstract Pioperties and Applications to Term Rewriting Systems." J. Assoc. Comp. Mach. 27,4 (1980) 797-821. 69

[66] G. Huet. "A Complete Proof of Correctness of the Kml~h-Bendix Complcti~m Algorithm ." JCSS 23,1 (1981) 11-21. [67] G: Huet. "Initiation £ ta Th4orie des Cat6gories." Polycopi4 de cours de DEA, Universit6 Paris VII (Nov. 1985). [68] G. Huet. iCCartesian Cloud Categories and Lambda-Calculus." Category Theory Seminar, Carnegie-Mellon University (Dec. 1985). [69] G. Huet, J.M. Hullot. "Proofs by L~duction in Equational Theories With Constructors." JACM 25,2 (1982) 239-266. [70] G. Huet, J.J. Ldvy "Call by Need Computations in Non-Ambiguous Linear Term Rewriting Systems." Rapport Laboria 359, IRIA (Aug. 1979). [71] G. tibet, D. Oppen. "Equations and Rewrite Rules: a Survey." In Formal Languages: Perspectives and Open Problems, Ed. Book R., Academic Press (1980). [72] J.M. Hullot "Compilation de Formes Canonlques dans les Theories Equationnelles ." Th~se de 3~me cycle, U. de Paris Sud (Nov. 80). [73] Jean Pierre Jouannaud, Hclene Kirchner. "Completion of a set of rules modulo a set of equations." (April 1984). [74] L.S. Jutting. ~A translation of Landau's "Grundlagen" in AUTOMATH." Eindhoven Uni- versity of Technology, Dept of Mathematics (Oct. 1976). [75] L.S. van Benthem Jutting. "The language theory of Aoo, a typed A-calculus where terms are types." Unpublished manuscript (1984). [76] G. Kahn, G. Plotkin. "Domaines concrets ." Rapport Laboria 336, IRIA (D6c. 1978). [77] J. Ketonen, J. S. Weening. "The language of an interactive proof checker." Stanford Univer- sity (1984). [78] J. Ketonen. "EKL-A Mathematically Oriented Proof Checker." 7th International Conference on Automated Deduction, Napa, Califonfia (May 1984). Springer-Verlag LNCS 170. [79] J. Ketonen. "A mechanical proof of Ramsey theorem." Stanford Univ. (1983). [80] S.C. Kleene. "Introduction to Meta-mathematics." North Holland (1952). [81] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31 (1945). [82] S.C. Kleene. "On the interpretation of intuitionistic number theory." J. Symbolic Logic 31 (1945). [83] J.W. Klop. "Combinatory ~eduction Systems." Ph. D. Thesis, Mathematisch Centrum Amsterdam (1980).

[84] D. Knuth, P. Bendix. "Simple word problems in universal algebras". In: Computational Problems in Abstract Algebra, J. Leech Ed., Pergamon (1970) 263--297. 70

[85] D.E. Knuth, J. Morris, V. Pratt. '~Fast Pattern MatchiIig in Strings." SIAM JouruM on Computing 6~2 (1977) 323-350. [86] G. Kreisel. "On the interpretation of nonfinitist proofs, Part I, II." JSL 16 (1952, i953). [87] J. Lambek. "From Lambda-calculus to Cartesian Closed Categories." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Seldin and J. R. Hindley, Academic Press (1980). [88] J. Lambek and P. J. Scott. "Aspects of Higher Order Categorical Logic." Contemporary Mathematics 30 (1984) 145-174. [89] P. J. Landin. "The next 700 programming languages." Comm. ACM 9,3 (1966) 157-166. [90] Philippe Le Chenadec. "Formes canoniques dans les alg~bres finiment pr~sent~es ." Th~se de 3~me cycle, Univ. d'Orsay (Juin 1983). [91] D. Leivant. "Polymorphic type inference." 10th ACM Conference on Principles of Program- ruing Languages (1983). [92] D. Leivant. "Structural semantics for potymorphic data types." 10th ACM Conference .on Principles of Programming Languages (1983). [93] J.J. L~vy. "R~ductions correctes et optimales dans le A-calcul," Th~se d'Etat, U. Paris VII (1978). [94] S. MacLane. "Categories for the Working Mathematician." Springer-Verlag (1971). [95] D. MacQueen, G. Plotkin, R. Sethi. "An ideal model for recursive polymorphic types." Proceedings, Principles of Programming Languages Symposium, Jan. 1984, 165-174. [96] D. B. MacQueen, R. Sethi. "A semantic model of types for applicative languages." ACM Symposium on Lisp and Functional Programming (Aug. 1982). [97] E.G. Manes, "Algebraic Theories." Sprh~ger-Verlag (1976). [98] C. Mann. "The Connection between Equivalence of Proofs and Cartesian Closed Categories." Proc. London Math. Soc. 31 (1975) 289-310. [99] A. Martelli, U. Montanari. "Theorem proving with structure sharing and efficient unification." Proc. 5th IJCAI, Boston, (1977) p 543. [100] A. Ma~'telli, U, Montanari. "An Efficient Unification Algorithm ." ACM Trans. on Prog. Lang. and Syst. 4~2 (1982) 258-282. [101] William A. Martin. "Determining the equivalence of algebraic expressions by hash coding." JACM 18,4 (1971) 549-558. [102] P. Martin-LSf. "A theory of types." Report 71-3, Dept. of Mathematics, University of Stockholm, Feb. 1971, revised (Oct. 1971). [103] P. Martin-L6f. "About models for intuitionistie type theories and the notion of definitional equality." Paper read at the OrlSans Logic Conference (1972). 71

[104] P. Martin-Lhf. "An intuitionistic Theory of Types: predicative part." Logic Colloquium 73, Eds. H. Rose and J. Sepherdson, North-Holland~ (1974) 73-118. [105] P. Martin-LhL "Constructive Mathematics and Computer Programming." In Logic~ Method- ology and Philosophy of Science 6 (1980) 153-175, North-Holland. [106] P. Martin-Lhf. "Intuitionistic Type Theory." Studies in Proof Theory, Bibliopotis (1984). [107] J. Mc Carthy. "Recursive functions of symbolic expressions and their computation by ma- chine." CACM 3,4 "(1960) 184-195. [108] N. McCracken. "An investigation of a programming language with a polymorphic type structure." Ph.D. Dissertation, Syracuse University (1979). [109] D.A. Miller. "Proofs in Higher-order Logic." Ph. D. Dissertation, Carnegie-Mellon Univer- sity (Aug. 1983). [110] D.A. Miller. "Expansion tree proofs and their conversion to natural deduction proofs." Technical report MS-CIS-84-6, University of Pennsylvania (Feb. 1984). [111] R. Milner. "A Theory of Type Polymorphism in Programming." Journal of Computer and System Sciences 17 (1978) 348-375. [112] R. Mitner. "A proposal for Standard ML ." Report CSR-157-83, Computer Science Dept., University of Edinburgh (1983). [113] R.P. Nederpelt. "Strong normalization in a typed A calculus with ~ structured types." Ph. D. Thesis, Eindhoven University of Technology (1973). [114] R.P. Nederpelt. "An approach to theorem proving on the basis of a typed ;~-calcutus2 5th Conference on Automated Deduction, Les Arcs, France. Springer-Verlag LNCS 87 (1980). [115] G. Nelson, D.C. Oppen. "Fast decision procedures based on congruence closure." JACM 27,2 (1980) 356-364. [116] M.H.A. Newman. "On Theories with a Combinatorial Definition of "Equivalence"." Annals of Math. 43,2 (1942) 223-243. [I17] B. Nordstrhm. "Programming in Constructive Set Theory: Some Examples." Proceedings of the ACM Cor~erence on Functional Programming Languages and Computer Architecture, Portnmuth~ New Hampshire (Oct. I981) 141-154. [t18] B. Nordstrhm. "Description of a Simple Programming Language." Report 1, Programming Methodology Group, University of Goteborg (Apr. 1984). [119] B. Nordstrhm, K. Petersson. ~'Types and Specifications." Information Processing 83, Ed. R~ Mason, North-Holland, (1983) 915-920. [120] B. Nordstrhm, J. Smith. "Propositions and Specifications of Programs in Martin-L6f's Type Theory." BIT 24, (1984) 288-301. [121] A. Obtulowicz. "The Logic of Categories of Partial ~nctions and its Applications." Disser- tationes Mathematicae 241 (1982). 72

[1221 M.S. Paterson, M.N. Wegman. ~Linear Unification ." J. of Computer and Systems Sciences 16 (1978) 158-167. [123] L. Paulson. "Recent Developments in LCF : Examples of structural induction." Technical Report No 34, Computer Laboratory, University of Cambridge (Jan. 1983). [124] L. Paulson. "Tactics and Tacticals in Cambridge LCF." Technical Report No 39, Computer Laboratory, University of Cambridge (July 1983). [125] L. Paulson. "Verifying the unification algorithm in LCF." Technical report No 50, Computer Laboratory, University of Cambridge (March 1984). [126] L. C. Paulson. ~'Constructing Recursion Operators in Intuitionistic Type Theory." Tech. Report 57, Computer Laboratory, University of Cambridge (Oct. 1984). [127] G.E. Peterson, M.E. Stickel. "Complete Sets of Reductiol~ for Equational Theories with Complete Unification Algorithms ." JACM 28,2 {1981) 233-264. [128] T. Pietrzykowski, D.C. Jansen. "A complete mechanization of c0-order type theory." Pro- ceedings of ACM Annual Conference (1972). [129] T. Pietrzykowski. "A Complete Mechanization of Second-Order Type Theory." JACM 20 (1973) 333-364. [130] D. Prawitz. "Natural Deduction." Ahnqist and Wiskell, Stockolm (1965). [131] D. Prawitz. "Ideas and results in proof theory." Proceedings of the Second Scandinavian Logic Symposium (1971). [132] PRL staff. "Implementing Mathematics with the NUPRL Proof Development System." Com- puter Science Department, Cornelt University (May 1985). [133] H. Rasiowa, R. Sikorski "The Mathematics of Metamathematics." Monografie Matematyczne tom 41, PWN, Polish Scientific Publishers, Warszawa (1963). [134] J. C. Reynolds. "Definitional Interpreters for Higher Order Programming Languages." Proc. ACM National Conference, Boston, (Aug. 72) 717-740. [135] J. C. Reynolds. "Towards a Theory of Type Structure." Programming Symposium, Paris. Sprlnge~ Verlag LNCS 19 (t974) 408-425. [136] J. C. l'teynolds. "Types, abstraction, and parametric polymorphism." IFIP Congress'83, Paris (Sept. 1983). [137] J. C. Reynolds. "Potymorphism is not set-theoretic." International Symposium on Semantics of Data Types, Sophia-Antipolis (June 1984). [138] J. C. Reynolds. "Three approaches to type structure." TAPSOFT Advanced Seminar on the Role of Semantics in Softwire Development, Berlin (March 1985). [139] J. A. Robinson. "A Machine-Oriented Logic Based on the Resolution Principle ." JACM 12 (1965) 32-41. 73

[140] J. A. Robinson. "ComputationM Logic: the Unification Computation ." Machine Intelligence 6 Eds B. Meltzer and D.Michie, American Elsevier, New-York (1971). [141] D. Scott. "Constructive validity. ~ Symposium on Automatic Demonstration, Springer-Verlag Lecture Notes in Mathematics, 125 (1970). [142] D. Scott. "Relating Theories of the Lambda-Calculus." in To H. B. Curry: Essays on Combinatory Logic, Lambda-calculus and Formalism, Eds. J. P. Scldin and J. R. Hindley, Academic Press (1980). [143] J.R. Shoenfield. "Mathematical Logic2 Addison-Wesley (1967). [144] R.E. Shostak ~Deciding' Combinations of Theories." JACM 31,1 (1985) 1-12. [145] J. Smith. "Course-of-values recursion on lists in intuitionistic type theory." Unpublished notes, GSteborg University (Sept. 1981). [146] J. Smith. "The identification of propositions and types in Martin-Lofts type theory : a programming example. ~ International Conference on Foundations of Computation Theory, Borghotm, Sweden, (Aug. 1983) Springer-Verlag LNCS 158. [147] It. Statman. "intuitionistic Propositional" Logic is Polynomial-space Complete." Theoretical Computer Science 9 (1979) 67-72, North-Holland. [148] I~. Statman. "The typed Lambda-CaIculus is not Elementary Recursive." Theoretical Com- puter Science 9 (1979) 73-81. [149] S. Stenlund. "Combinators h-terms, and proof theory." P~eidel (1972). [150] M.E. Stickel ~A Complete Unification Algorithm for Associative-Commutative Functions." JACM 28,3 (1981) 423-434. [151] M.E. Szabo. "Algebra of Proofs." North-Holland (1978). t152] W. Tait. "A non constructive proof of Gentzen's Hauptsatz for second order predicate logic." Bull. Amer. Math. Soc. 72 (1966). [153] W. Tait. "Intensional interpretations of functionals of finite type I." J. of Symbolic Logic 32 (1967) 198-212. I154] W. Tait. '~A Realizability Interpretation of the Theory of Species." Logic Colloquium, Ed. R. Parikh, Springer Verlag Lecture Notes 453 (1975). [155] M. Takahashi. "A proof of cut-elimination theorem in simple type theory." J. Math. Soc. Japan 19 (1967).

[156] G. Takeuti. "On a generalized logic calculus." Japan J. Math. 23 (1953). [157] G. Takeuti. "Proof theory." Studie.~ in Logic 81 Amsterdam (1975). [158] R. E. Tarjan. "Efficiency of a good but non linear set union algorithm." JACM 22,2 (1975) 215-225. 74

[159] t~.. E. Tarjan, J. van Leeuwen. "Worst-case Analysis of Set Union Algorithms." JACM 31~2 (19s5) 245-2sl. [160] A. Tarski. "A lattice-theoretical fixpoint theorem and its applications." Pacific J. Math. 5 (1955) 285-309. [161] D.A. ~lrner. "Miranda: A non-strict functional language with polymorphic types." In ~nc- tional Programming Languages and Computer Architecture. Ed. J. P. Jouannaud~ Springer- Verlag LNCS 201 ([985) 1-16. [162] R. de Vrijer "Big Trees in a )t-calculus with )t-expressions as types," Conference on A-calculus and Computer Science Theory, Rome, Springer-Verlag LNCS 37 (1975) 252-271. [163] D. Warren "Applied Logic - Its use and implementation as a programming tool." Ph.D. Thesis~ University of Edinburgh (1977). An Introduction to Automated Deduction

Mark E. Stickel Artificial Intelligence Center SRI International Menlo Park, California 94025

Contents

1 Introduction

2 Resolution 2.1 Elimination of Tautologies 2.2 Purity 2.3 Subsumption 2.4 Set of Support 2.5 P1 and N1 Resolution 2.6 Hyperresolution 2.7 Unit Resolution 2.8 Unit-Resulting Resolution 2.9 Input Resolution 2.10 Prolog 2.11 Linear Resolution 2.12 Model Elimination 2.13 Prolog Technology Theorem Prover 2.14 Connection-Graph Resolution 2.15 Nonclausal Resolution 2.16 Connection Method 2.17 Theory Resolution 2.18 Krypton

Unification 3.1 Unification in Equational Theories 3.2 Commutative Unification 3.3 Associative Unification 3.4 Associative-Commutative Unification 3.5 Many-Sorted Unification

4 Equality Reasoning 4.1 Equality Axiomatization 4.2 Demodulation 4.3 Paramodulation 4.4 Resolution by Unification and Equality 4.5 E-Resolution 4.6 Knuth-Bendix Method

References 76

1 Introduction

In this chapter, we present an informal introduction to many of the methods currently used in automated deduction. The principal method for theorem proving that we discuss is resolution, but we are also substantially concerned with extending the resolution framework to reason more efficiently about particular theories. The chapters by G~rard Huet and Wolfgang Bibel complement this one. In this chapter, we treat classical logic and classical (if resolution, developed in 1963, can be considered classical!) methods of theorem proving. Huet considers systems that merge the notions of computation and deduction and Bibel extends classical reasoning to nonmonotonic reasoning, metalevel reasoning, and reasoning about uncertainty. Increasingly many good books present various parts of the material that we informally give here. Following are brief descriptions of some of them and their strengths. Chang and Lee [12] was the first textbook for resolution, paramodulation, and unification, and it is still very useful. Loveland [52] and Bibel [5] are newer texts that are exceptionally strong in the areas of linear refinements of resolution and the connection method, which they have developed, respectively. Wos et al. [88] is written for a wider audience and reflects their practical experience in theorem proving; it is especially strong in the areas of deciding how to formalize problems and to select strategies for their solution. Kowalski [39] emphasizes the important connections between auto- mated deduction and logic programming. Manna and Waldinger [56] (unfortunately, only Volume 1 is available so far) and Gallier [22] are new textbooks in symbolic logic that are oriented toward computer science and automated deduction. Some of the topics in this chapter are too new or specialized to be included in any textbooks, but references are included at the end of the chapter for those who want to learn more.

2 Resolution

One of the most important procedures for automated deduction is resolution [66]. Its application to propositional calculus theorem proving will be examined first. The language of the propositional calculus includes a set of propositional symbols P, Q, R, and the like, the logical connectives ~ v,/\, ~, and ---, the logical constants true and false, and parentheses. A formula of the propositional calculus is one of 77

® The atomic formula or atom P, where P is a propositional symbol

• The negation -,A, where A is a formula of the propositional calculus

• The disjunction (AVB), where A and B are formulas of the propositional calculus

• The conjunction (A A B), where A and B are formulas of the propositional calculus

• The implication (A D B), where A and B are formulas of the propositional calculus

• the equivalence (A =-- B), where A and B are formulas of the propositional calculus.

Since V, A, and ~- are associative, they are often treated as n-ary operators for arbitrary n so that, for example, (A V (B V C)) and ((A V B) v C) can both be written as (A V B V C). The 7, V, A, D, and = are ordered by declining operator precedence. For convenience, parentheses can be omitted where precedence can be used to determine the correct reading. Thus, for example, ((A A B) D C) coon also be expressed by A A B D C. Subformulas can be classified as occurring with positive polarity (positively) or with negative polarity (negatively). A subformula occurs positively in a formula if it is embedded in an even number of explicit or implicit negations (equivalences and left-hand sides of implications implicitly negate formulas). A subformula occurs negatively in a formula if it is embedded in an odd number of explicit or implicit negations. Thus, for example, A occurs positively in A, AVB, AAB, B D A, and A - B and negatively in -~A, A D B, and A - B. Note that A and B and their subformulas occur both positively and negatively in A ---- B. An interpretation of a formula is an assignment of the truth values true or false to each propositional symbol in the formula. The value of a formula in an interpretation can be computed using the following truth table:

A B -~A -~B (AVB) (AAB) (ADB) (A=B) true true false false true true true true true false false true true false true false false true true false true false false false false false true true false false true true

Given a formula of the propositional calculus and an interpretation of it, the value of the formula in the interpretation can be computed by replacing propositional symbols in the formula 78 by their values in the interpretation and reducing the formula by means of the truth table to either true or false. An interpretation satisfes a formula and is a model of it if the formula is true in the interpre- tation. A formula is valid if and only if every interpretation is a model and unsatisfiable if and only if no interpretation is a model. The process of determining validity of a formula by the truth-table method is exponential in the worst case, requiring determination of the value of the formula in each of 2 ~ interpretations, where n is the number of propositional symbols appearing in the formula. Although all known algorithms for determining validity are exponential, because propositional validity is an NP- complete problem, methods such as resolution generally yield better performance than truth-table evaluation as well as being more readily extended to theorem proving in the first-order predicate calculus. Resolution is a refutation procedure. Instead of determining the validity of a formula directly, it determines the unsatisfiability of its negation. Thus, the first step in the use of the resolution procedure is to negate the formula to be proved valid. In the case where it is intended to prove a theorem from a set of axioms, i.e., the formula is of the form A1 A • • • A A, D B, where the A~ are axioms and B is the theorem, then negating the formula results in formation of the conjunction

A1 A -.. A An A ~B, i.e., only the theorem needs to be negated. For most forms of resolution, the formula must then be transformed into clause form. A literal is either a propositional symbol (e.g., P) or the negation of a proposition symbol

(e.g., -~P). The former are positive literals and the latter are negative literals. A clause is a disjunction L1 V .-. V L,~ of literals. The logical constant false is sometimes referred to as the empty clause, because it can be viewed as the disjunction of zero literals. A unit clause is a clause with exactly one literal. More generally, an n-clause for n = 1, 2, 3,... is a clause with exactly n literals. A clause with at least two literals is called a nonunit clause. A positive clause is a clause all of whose literals are positive. A negative clause is a clause all of whose literals are negative. (The empty clause can be considered both positive and negative.)

A mixed clause is a clause which is neither positive nor negative, i.e., it has at least one positive and at least one negative literal. A Hcrn clause is a clause with at most one positive literal. A pair of literals is complementary if one is positive and the other is negative and their propositional symbols are the same. A formula is in clause form if it is a conjunction C1 A ... A C~ of clauses Ci. Given that 79 conjunction and disjunction are associative, commutative, and idempotent, a formula is often regarded as a set of clauses, with each clause being a set of literals. A formula can be transformed to clause form by application of the following rewrites until the formula cannot be rewritten any further:

(A ~ B) -* ((-A V B) A (-~B V A)) (A D B) --* (-~A V B) -~-~A ~ A -~(A V B) --* (-~A A -~B) ~(A A B) --* (-~A V -~B) (A V (B A C)) -+ ((AVB) A(AVC)) ((BAC) V A) -* ((B V A) A (C V A))

Note that the clause form of a formula is not necessarily unique. For example, (P = Q) A (Q ~ R) A (R ~ P) is equivalent to both (~P v Q) A (~Q v R) A (-~R v P) and (~Q v P) A (-~R V Q) A (-~P v R). The resolution rule of inference states that the resolvent clause A V B can be derived from the parent clauses P V A and -~P v B, where P is a propositional symbol and A and B are arbitrary clauses. Thus, false can be obtained by resolving the clauses P and -~P, Q can be obtained by resolving the clauses P and ~P v Q, and Q v R can be obtained by resolving the clause P v R and -~P V Q. The order of the literals in the clauses is unimportant. For example, P V A denotes any clause that is the disjunction of P and the literals, if any, of A. Clauses that are derived from other clauses are referred to as derived clauses. Other clauses, i.e., those that were given as inputs to the deduction system, are referred to as input clauses. The resolution rule of inference is an extension of the standard modus ponens rule in logic, which permits the derivation of Q from P and P D Q whose clause form is ~P v Q. A set of clauses is unsatisfiable if and only if the empty clause false is derivable from the set of clauses by resolution, and resolution is refutation complete or simply complete. Following is a resolution proof of the unsatisfiability of {P V Q, P v -~@, ~P v @, ~P v -~Q} (note that idempotence of V is used to automatically replace clauses of the form P v P v C by the equivalent P v C):

1. PvQ 2. Pv-~Q 3. -~PV@ 4. =P V -~Q B0

5. P resolve 1 and 2 6. -~P resolve 3 and 4 7. false resolve 5 and 6

The following sections will be concerned with various refinements of resolution. These re- finements will be described in terms of resolution applied to the propositional calculus, but can readily be extended to apply to the first-order predicate calculus. But first consider the general requirements for the resolution in the first-order predicate calculus. The first-order predicate calculus includes variables and n-ary predicate and function sym- bols. Propositional symbols are essentially 0-ary predicate symbols. Constant symbols are 0-ary function symbols. The first-order predicate calculus also adds the V and 3 quantifiers. A term of the first-order predicate calculus is one of

• A variable symbol

• f(Q,... ,t,), where f is an n-ary function symbol and tl .... ,t, are terms.

A formula of the first-order predicate calculus is one of:

• The atomic formula or atom P(tl,..., t,), where P is an n-ary predicate symbol

and Q,..., t, are terms.

• The universal quantification (VxA), where x is a variable and A is a formula of

the first-order predicate calculus.

• The existential quantification (3xA), where x is a variable and A is a formula of

the first-order predicate calculus.

• The negation, disjunction, conjunction, implicatio~ and equivalence of first-order predicate calculus formulas, defined analogously to formulas of the propositional

calculus.

The intuitive interpretation of the quantified formulas is that VxA means that A is true for every value of x and 3xA means that A is true for some value of x. VxA is equivalent to -~3x-~A and 3xA is equivalent to -~Vx-~A. The quantifier Yx or 3x binds the variable x (and x is bound by the quantifier) in VxA or 3xA. Resolution operates on unquantified formulas, so it is necessary to remove quantifiers from quantified formulas by skolemization. The skolemized formula is unsatisfiable if and only if the original formula is unsatisfiable. 81

The concept of quantifier force is used to deal with the fact that a universal quantifer behaves like an existential quantifier and vice versa if it appears inside a negation. If A is a formula and

VxB is a subformula of A, then Vx has universal force in A if VxB occurs positively in A and existential force in A if it occurs negatively in A. Similarly, if A is a formula and 3xB is a subformula of A, then 3x has existential force in A if 3xB occurs positively in A and universal force in A if it occurs negatively in A.

Let A be a formula to be tested for unsatisfiability. Assume that A has no unbound variables and that each quantifier binds a different variable. These conditions can be achieved by adding universal quantifiers to the beginning of the formula for unbound variables and renaming variables.

Assume further that every quantifier in A is of universal force or existential force, but not both, i.e., no quantifier appears inside an equivalence. If some quantifier appears inside an equivalence

B -- C, the equivalence must be replaced by an equivalent formula such as (B D C) A (C D B). Let QxB be a subformula of A where Qx is a quantifier of existential force. Let QlxlA1, ..., Q,x,A, (n > 0) be the successively smal!er quantified subformulas (each A~ contains Qxi+lA~+l) of A that contain QxB where each Q~x~ is a quantifier of universal force. Then replace QxB in A by the formula B with every occurrence of x replaced by the term e if n = 0 or f(xl,... ,x~) if n > 1, where c is a new Skolem constant or f a new Skotem function, i.e., one that does not already appear in the formula. This process is repeated until no quantifiers of existential force remain at which point all remaining quantifiers can be removed leaving an unquantified formula. Skolemization is often described as if it applied only to formulas in prcnex form, i.e., those of the form Qlxl, ..., QnxnA where A contains no quantifiers. However, this restriction is unnecessary and has the disadvantage that skolemizing a formula after its conversion to an equivalent formula in prenex form may lead to skolem functions having more arguments than necessary.

For example, to prove that John has a father from the statement that everyone has a fa- ther, it is necessary to refute the formula Vx3yFather(x,y) A -~3zFather(John, z). This can be skolemized to Father(x, f(x)) A ~Father(John, z). A single step resolution refutation exists with the substitution of John for x and f(John) for z. The Skolem function f can sometimes be intuitively interpreted such that f is the function of its arguments xl,..., x, that computes the value required for the containing expression to be true. For example, in Father(z, f(x)), f(x) can be thought of as referring to the father of x.

An expression is called ground if it contains no variables. A set of ground clauses can be regarded as a syntactic variation of clauses of the propositional calculus. A set of clauses S is unsatisfiable if and only if there is an unsatisfiable set of ground clauses S r such that each clause 82 in S t is an instance of a clause in S. Note that a single clause in S may require more than one instance in S I for S ~ to be unsatisfiable. For example, the set of clauses S consisting of P(a) V P(b) and -~P(x) is unsatisfiable, but S' contains two instances -~P(a) and -~P(b) of -~P(x). When instantiating clauses in S, it is only necessary to consider replacing variables by terms construetible from symbols occurring in S (the Hcrbrand universe of S), i.e., no new function or constant symbols need be introduced. An exception is that if S contains variables but no constant symbols then a single constant symbol is added. Before resolution was developed, some proof procedures successively formed instantiations of S by replacing variables by terms in the Herbrand universe of S in ascending order of term complexity. The resulting sets S f were then tested for unsatisfiability. This approach is inefficient because the instantiation process is not well directed to finding the specific instances of variables that lead to the result being unsatisfiable. Resolution is an important inference procedure for two reasons. First, as described above, it is a single inference rule for determining the unsatisfiability of sets of clauses of the proposi- tionM calculus. Second, it instantiates variables in a manner that is more directed to finding an unsatisfiable instantiation. When resolving two clauses of the first-order predicate calculus, two literals are resolved on and the remaining literals are disjoined to form the resolvent, just as for propositional calculus clauses. However, there are two differences. First, two clauses of the first-order predicate calculus are standardized apart before being re- solved, i.e., variables of one or both of the clauses are renamed so that the two clauses have no variables in common. This is valid because a set of clauses whose unsatisfiability is being deter- mined is considered to be the conjunction of a set of universally quantified clauses, and any pair of conjoined formulas VxP(x) A VxQ(x) is equivalent to a variable renamed one VxP(x) A VyQ(y). The set of clauses consisting of P(a, x) and -~P(x, b) is unsatisfiable, but P(a, x) and P(x, b) have no common instance as is required for a resolution operation. After renaming variables, however, P(a, x) and P(y, b) have the common instance P(a, b), and resolution is possible. Second, resolution finds by unification a most general substitution that makes a pair of lit- erals complementary. This substitution is then applied to the remaining literals in forming the resolvent. For example, if P(a, x) v Q(x) and -~P(y, b) v R(y) are resolved, the most general substitution that makes P(a,x) and -~P(y,b) complementary is the substitution of b for x and a for y, and the resolvent is Q(b) v R(a). By finding most general substitutions to make pairs of literals from pairs of clauses complementary, resolution progressively finds instantiations of 83 clauses that might lead to a ground refutation.

Completeness arguments for resoIution for first-order predicate calculus generally rely on lifting theorems. These show how a resolution refutation of S t whose clauses are ground instances of clauses in S can be imitated by a resolution refutation of S. The fact that two or more Iiterals in a clause can be collapsed into a single literal in a ground instance is a complication. For example, the set of clauses consisting of P(x) V P(y) and -~P(u) V -~P(v) is unsatisfiable because ground instances P(a) and -~P(a) of the clauses are contradictory. However, resolving P(x) V P(y) and -~P(u) V -~P(v) only leads to resolvents like P(y) V -~P(v), P(y) v -~P(u), and the like. Resolving these resolvents among themselves and with the original clauses also yields no progress toward a refutation. In fact, every resolvent has two literals, and the empty clause can never be derived. There are two solutions to this difficulty. One is to model resolution for general clauses directly on resolution for ground instances so that there is a general resolution step corresponding to each step in the refutation of the ground instances. To accomplish this, it is necessary to resolve on possibly more than one literal of each clause simultaneously. For example, the set of clauses consisting of P(x)V P(y) and -~P(u)V -~P(v) has ground instances P(a) and -~P(a) with a single- step resolution refutation. The general resolution operation will then have to find a most general substitution that make all of P(x), P(y), P(u), and P(v) identical (for example, by substitution of x for y, u, and v). The second solution entails the addition of the factoring operation. The resolution rule for general clauses resolves on only a single pair of literaIs as in the case of ground clause, but the additional factoring operation adds clause instances (factors) that result from instantiating two or more literals of a clause so that they are identical. Thus, P(x) is a factor of P(x) v P(y) and ~P(u) is a factor of ~P(u) v -~P(v). The factors can then be resolved so that they result in the empty clause. When more than a single pair of literals must be collapsed to one in a factor (e.g., two separate pairs of literals must each be collapsed to single literals, or three literaIs must all be collapsed to a single literal), all the factors can be generated by successively applying the factoring operation to single pairs of Iiterals. Although resolution is complete, it is not very efficient when measured in terms of the size of the search space for a resolution refutation. Since the development of resolution, many refinements have improved its efficiency. Some, such as elimination of tautologies and subsumption, discard useless or redundant results. Many restrict which pairs of clauses are allowed to resolve with each other. Some of these restrictions, such as set of support~ preserve completeness, while other, such as unit resolution, are complete for only some sets of clauses. 84

2.1 Elimination of Tautologies

A clause that contains both a literal and its negation is a tautology that can, in most resolu- tion procedures, be discarded. The exceptions among the procedures discussed here are model elimination, which uses chains instead of clauses, but may require retaining chains with comple- mentary iiterals, and some forms of theory resolution. In general, the rationale for being able to discard tautologies is that they can be evaluated as true by truth-functional rules and, thus, cannot contribute to the falsity of a conjunction of clauses.

2.2 Purity

A literal whose complement does not appear in a set of clauses is called pure. Because a pure literal can never be resolved on and thus eliminated, any clause containing a pure literal can never appear in the derivation of the empty clause. Thus, all clauses containing pure literals can be safely deleted.

2.3 Subsumption

A clause C subsumes a clause D if C's literals are a subset of D's literals. It requires an equal amount of work or less to derive the empty clause from C as D if C subsumes D. Thus, D can be eliminated if C subsumes D. Two forms of subsumption can be employed: forward subsumption or the discarding of a newly derived clause that is subsumed by a clause that is already present, i.e., an input clause or a previously derived clause, and backward subsumption or the discarding of clauses already present that are subsumed by a newly derived clause. Normally, when a clause is derived, it should be tested for elimination by forward subsumption before being used to eliminate other clauses by backward subsumption. If these operations are performed in the opposite order, then some clause necessary to a refutation may be continually derived and then eliminated shortly thereafter by backward subsumption without ever being used, because the search strategy may order inference operations partially based on the age of the clause.

2.4 Set of Support

Resolution is often used to prove a theorem from a set of axioms that is known to be satisfiable.

However, unrestricted resolution does not distinguish between clauses that are created from ax- 85 iotas (axiom clauses) and those created from the [negation of the] theorem (theorem clauses)--all are treated alike. A refutation cannot be found from the satisfiable set of axiom clauses alone; a refutation must depend upon the theorem clauses. The set of support restriction was developed to take advantage of this necessary dependency and make resolution more goal-directed.

The set of support restriction [90] is a complete restriction of resolution that requires division of the total set of clauses S into disjoint subsets T and S - T such that S - T is a satisfiable set of clauses. The set of clauses created from the theorem is typically designated as the set of support T. Axiom clauses would then comprise S - T.

The set of support restriction allows two clauses to be resolved only if at least one of the clauses is supported by T--is in T or has an ancestor clause in T. This can substantially reduce the size of the search space and makes the procedure more goal-directed because every derived clause is derived from a theorem clause.

An alternative definition of the set of support restriction is that it allows two clauses to be resolved only if it is not the case that both are in S - T. When all the theorem clauses are designated as the set of support T, this means that the only unallowed resolution operations are those between axiom clauses.

Even if a problem is not posed in terms of a satisfiable set of axioms and a theorem so that the theorem clauses can be designated as the set of support, the set of support restriction can still be used. Syntactic criteria can be used to designate a set of support.

Any unsatisfiable set of clauses must include at least one positive clause and at least one negative clause. The interpretation that assigns false (resp, true) to every atom is a model for a set of clauses that contains no positive (resp., negative) clauses. Thus, the set of all positive clauses or the set of all negative clauses can be designated as the set of support because it is guaranteed that the set of remaining clauses is satisfiable.

Note that the set of support restriction is only complete if the set of clauses outside the set of support is satisfiable. The set of clauses {P, -~P, -~Q} cannot be refuted if only ~Q is in the set of support. Therefore, Q cannot be proved from P and -~P, if only the negated theorem is used as the set of support.

Logic is sometimes criticized as being unsuitable for artificial-intelligence applications because anything, e.g., Q, can be proved from an inconsistency, e.g., P and -~P. Although it is hard to argue that an inconsistent set of axioms is desirable, the critics claim that large collections of axioms about the real world may inadvertently be inconsistent, and that it would be undesirable to conclude irrelevant statements from an inconsistency in the axioms, The set of support restriction 86 provides some protection from this problem. Its failure to prove Q from P and -~F implies that the inconsistency must be connected to the theorem via resolution operations and, hence, must in some sense be relevant to the conclusion for a set of support refutation to succeed. It is worth considering whether there is a relationship between the set of support restriction and the logic of entailment or relevant implication.

2.5 P1 and NI Resolution

P1 resolution [67] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a positive clause. P1 resolution can be viewed as an extension of the set of support restriction and is also complete. Using the set of support restriction, it is legitimate to designate the set of all positive clauses as the set of support. Resolution operations between input clauses will then require one parent to be a positive clause as desired. However, with just the set of support restriction, any derived clause can be resolved with any other clause and the intended restriction that one of the parent clauses to each resolution operation must be positive will not be obeyed. After each resolution operation, the resulting set of clauses is unsatisfiable provided the initial set of clauses is unsatisfiable. Thus, the set of support restriction (with the set of all positive clauses designated as the set of support) can be applied to each set of clauses resulting after performing a resolution operation and not just to the initial set of clauses, effectively imposing the desired restriction that one parent clause of each resolution operation be a positive clause. The primary importance of P1 resolution is its relation t o hyperresolution.

N1 resolution is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be a negative clause. It is defined analogously to P1 resolution and has similar properties.

2.6 Hyperresolution

Hyperresolution is a more efficient version of P1 resolution. Although ordinary resolution op- erations take two clauses as arguments, hyperresolution is the first of several operations to be discussed that may require an arbitrary number of arguments. Each hyperresolution operation takes a single mixed or negative clause, termed the nucleus, as one of its arguments and as many positive clauses, termed electrons, as there are negative literals in the nucleus as the other arguments and produces a positive clause result. Each negative literal 87 of the nucleus is resolved with a literal in one of the electrons. The hyperresolvent consists of all the positive literals of the nucleus disjoined with the unresolved on literals of the electrons.

The completeness of hyperresolution can be used to prove the claim that an unsatisfiable set of Horn clauses never needs to contain more than one negative clause (if a set of Horn clauses has more than one negative clause, then at least one of the negative clauses alone is unsatisfiable with the positive and mixed clauses). Results of hyperresolution operations are always positive clauses. Thus, any negative clause can only be a parent to the empty clause in a hyperresolution operation. But, because the empty clause needs to be derived only once in a refutation, it is unnecessary for more than one negative clause to be used. Negative hyperresolution is exactly the same as hyperresolution except it is an efficient version of N1 instead of P1 resolution and thus derives negative instead of positive clauses.

2.7 Unit Resolution

Unit resolution [11] is the restriction of resolution that requires at least one of the parent clauses in each resolution operation be a unit clause. This is an appealing restriction when considered from the point of view of implementation and efficiency because a resolvent always has fewer Iiterals than its longer parent clause. Because the goal in resolution theorem proving is to derive the empty clause, shorter clauses are "closer" to the goal than longer clauses. Thus, unit resolution always appears to be making progress toward the goal.

Unit resolution is obviously incomplete because not every unsatisfiable set of clauses contains a unit clause; however, most importantly, it is complete for sets of Horn clauses. The completeness of unit resolution for Horn clauses is easily shown. PI resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses all positive clauses are also unit clauses, it is apparent that every PI resolution operation is also a unit resolution operation, and thus a P1 resolution refutation is also a unit resolution refutation.

2.8 Unit-Resulting Resolution

Unit-resulting resolution (UR-resolution} [58] is a more efficient version of unit resolution. The unit-resulting resolution operation, like the hyperresolution operation, takes an arbitrary number of arguments. Where hyperresolution operates on a single mixed or negative clause and a set of positive clauses and produces a positive or empty clause as its output, unit-resulting resolution 88 operates on a single nonunit clause and a set of unit clauses and produces a unit or empty clause as its output. In addition to the ultimate goal of deriving the empty clause, unit resolution can be seen to have as an intermediate goal the derivation of additional unit clauses, for only unit clauses can participate freely in resolution operations. The sole purpose of nonunit clauses is their role in deriving additional unit clauses, because they cannot be resolved with each other. Deriving a unit clause requires a nonunit clause, all but one of whose literals are successively resolved away by either input or derived unit clauses, in the initial set of clauses. Unit-resulting resolution implements this process of resolving away by unit resolution all but one of the literals of a nonunit clause more directly. A unit-resulting resolution operation takes as its input n unit clauses and a single n-clause or n + 1-clause and uses the n unit clauses to resolve n distinct literals away simultaneously in the nonunit clause, resulting in the empty clause or a derived unit clause. This eliminates the need to form and store derived nonunit clauses; they are handled implicitly by the unit-resolution procedure.

2.9 Input Resolution

Input resolution [11] is the restriction of resolution that requires that one of the parent clauses to each resolution operation must be an input clause, i.e., not a derived clause. Input resolution is incomplete. For example, {P v Q, P v -~Q, -~P v .Q,-~P v -~Q} cannot be refuted by input resolution. Input resolution can derive P and -~P (and Q and ~Q) but cannot resolve them with each other because neither is an input clause as required. Input resolution, like unit resolution, is complete for sets of Horn clauses. N1 resolution is complete for arbitrary sets of clauses and can thus be used to refute sets of Horn clauses. Because in sets of Horn clauses no clause has more than one positive literal, it is apparent that every N1 resolution operation results in a negative clause, and thus no two derived clauses can be resolved with each other and an N1 refutation is also an input refutation. This demonstration of the completeness of input resolution for Horn clauses also shows that it is also unnecessary to resolve arbitrary pairs of input clauses with each other, because it is sufficient to take only those pairs of input clauses that include a negative clause. More generally, input resolution is compatible with the set of support restriction. Thus, input resolution can be restricted without further loss of completeness so that every derived clause is supported by set of support T, where T can be selected arbitrarily so long as S - T is satisfiable. For exampIe, because an unsatisfiable set of Horn clauses never needs to contain more than one negative clause, 89 it is possible to refute sets of Horn clauses using a single negative clause as the set of support. It is interesting that unit and input resolution, despite their substantial operational differences, are both incomplete procedures that are complete for sets of Horn clauses. Actually, unit and input resolution are capable of solving exactly the same class of problems, i.e., if a unit resolution refutation exists, then an input resolution refutation also exists, and vice versa. There is a constructive proof of this fact that can be used to transform one kind of refutation into the other

[11]. Input resolution bears a strong resemblance to the problem-reduction method. In the problem- reduction method, the inputs are a set of primitively solvable goals, a set of rules stating that if a set of antecedent goals can be solved then the consequent goal can be solved, and a goal to be solved. Solution of the goal is accomplished by backward chaining. To solve a goal, one asks if it is primitively solvable. If it is not, then rules whose consequent goals are the same as the goal are used and solution of the antecedent goals is attempted. Such problems axe easily encoded as input resolution problems. Primitively solvable goals can be represented by positive unit clauses. Rules of the form "if goals Px,..., P~ are solvable then goal Q is solvable" can be represented by the clause Q v -~P1 v ... v -~Pn. The problem goal can be represented by a negative unit clause (a set of problem goals to be simultaneously solved could be represented by a negative nonunit clause). These are all Horn clauses, so input resolution is applicable. With only the negative clause in the set of support, input resolution implements backward chaining. Ordered input resolution is a further restriction of input resolution. In ordinary input resolu- tion, a supported n-clause can be used a derivation of the empty clause with the literals resolved away in any order. Even if each literal can be resolved away in only one way, there are n! derivations of the empty clause. This inefficiency is eliminated in ordered input resolution by not treating the disjunction connective v as a commutative operator for which order does not matter and by requiring that literals be resolved away in some fixed order, e.g., strictly left to right.

2.10 Prolog

Prolog [13,39], the currently most widely used logic programming language, is based on ordered input resolution and relies upon input resolution's resemblance to the problem reduction method. A Prolog program consists of a set of unit assertions P and nonunit assertions Q *- P1,. •., P~. The latter represents the clause Q v -~P1 v..- v ~Pn. Prolog can then be asked to evaluate queries with respect to the assertions. A query is represented by the Prolog clause *-- Q1,..., Q,~. The +- 90 connective can be interpreted as the ordinary implication connective except that the arguments are reversed. The literals on the right-hand side are conjoined. If it is converted to ordinary clause form, the literal on the left-hand side of a nonunit assertion will be positive; all the literals on the right-hand side of an assertion or query will be negative. This allows representation of all Horn clauses. Because there is no negation connective in Prolog, every clause has either one (in the case of assertions) or no (in the case of queries) positive literal.

Prolog program execution performs ordered input resolution to refute the query clause using the assertions. The query clause is designated as the single clause in the set of support and the leftmost literal of a derived clause is always resolved with the leftmost literal of an assertion. When the literal of a derived clause is resolved on, it is removed and, in the case of resolution with a nonunit assertions, the literals on the right-hand side of the assertion will appear in its place, in the same order as they appeared in the assertion. Let ~-- QI,.--, Q,~ be the current derived clause. Then~ resolution with the unit clause Q1 will result in *- Q~,..., Q,~ and resolution with the nonunit clause Q1 ~-- P1,..., P~ will result in •-" P~,..., P~, Q2,..., Q,~- As always, derivation of the empty clause completes the refutation.

To facilitate its use as a programming language as well as a deductive system, Prolog is much more precise than most deductive systems about the order in which inference operations are performed. It uses ordered input resolution with left to right resolution on literals. The assertions in a Prolog program are also ordered. Assertions that appear earlier in the list of assertions that comprise a ProIog program will be tried before later ones. The control strategy is depth-first search with backtracking on failure. If the current derived clause *- Q1,... ,Q,~ and Q1 is resolved away by the unit assertion Q1 or nonunit assertion Q1 *-

P1,.--,P~, all ways of refuting the derived clause ~-- Q2 ..... Qm or ~ P1 ..... P~, Q2,-..,Qm are explored before any other method of resolving away Q1 (by a later assertion in the Prolog program) is tried. When Prolog is blocked, i.e., the current derived clause is *-- Q1,...,Q,~, but Q1 cannot be resolved away by any assertion not already tried, the most recent resolution operation is undone and the next alternative is tried. Prolog was analyzed above from the narrow perspective of deduction. For general-purpose theorem proving, Prolog is inadequate mainly because its inference system for Horn clauses omits general disjunction and negation and its unbounded depth-first search strategy is incomplete. A further problem is that many Prolog systems employ unification without the occurs check. This will be discussed in the section on unification. However, Prolog is much more than a deduction system--it is a programming language with many attractive features. Prolog programs can often 9] be viewed as having logical and procedural interpretations. The logical interpretation has been discussed above. The procedural interpretation considers collections of clauses with the same predicate symbol in the he~id to comprise a procedure. Execution of the procedure proceeds to match the procedure~ca~l literal by trying alternative clauses in top-to-bottom order and satisfying subgoals of nonunit clauses in left-to-right order, backtracking to find alternative solutions as required. It supports the notion that algorithms should be viewed as being a combination of logic and control components [38,39}. Prolog efficiently implements an important subset of the features, including unification and backtracking, proposed for earlier artificial-intelligence languages such as PLANNER [24] and QA4 [68]. These earlier languages generally had less complete (relative to their specification) and less efficient implementations. Unification is used as a uniform mechanism for composing and decomposing data types, represented as first-order predicate calculus terms. Prolog provides a smooth interface to built-in predicates for arithmetic, input/output, and the like, as well as user-defined predicates with logical interpretation. The cut operation provides additional control capability.

Prolog's restriction to sets of Horn clauses has the natural justification that only for Horn clauses are all answers to queries certain to be definite. The inexpressibiIity of facts such as P(a) v P(b) in Horn clauses makes it unnecessary to consider whether 3xP(x) has the answer true, with x being either a or b, but not knowing which. Sets of ground unit clauses can be regarded naturally as containing the same information as a file in a relational database. Virtual relations can be defined by nonunit clauses. Assert and retract operations permit additions or deletions of clauses by a running Prolog program. The greater expressiveness of Prolog makes it a logical generalization of relational databases. Prolog provides a form of negation, though not the standard one, termed negation as failure [49,33] that supports reasoning with the closed-world assumption. The closed-world assumption asserts that for some predicate the given instances of the predicate comprise the entire set of instances of the predicate. Failure to prove a formula then implies its negation. This topic is covered more deeply in the chapter by" Bibel.

2.11 Linear Resolution

Input resolution and its derivatives (including Prolog) are incomplete. Linear resolution [50,53] is an extension of input resolution that requires that at least one of the parent clauses to each resolution operation must be either an input clause or an ancestor clause of the other parent. 92

Linear resolution is complete. Linear resolution can be further restricted while preserving com- pleteness; in particular, linear resolution, like input resolution, is compatible with set of support and ordering restrictions.

2.12 Model Elimination

The model elimination procedure [51,52] is isomorphic (in the propositional case) to a highly restricted form of linear resolution and is complete. It incorporates the set of support restriction, an ordering restriction on literals, and a requirement that earlier clauses in a derivation not subsume later ones. A procedure very similar to model elimination is called SL-resoIution [40].

The restriction of model elimination or SL-resolution to Horn clauses is basically ordered input resolution, i.e., the inference system employed by Prolog. This accounts for Prolog's inference system frequently being referred to as SLD-resolution (SL-resolution for definite clauses, where definite clauses are another name for Horn clauses).

Model elimination is technically not a form of resolution at all, because it operates on chains instead of clauses. A chain differs from a clause in that its literals are ordered and there are two types of Iiterals: A-literals and B-literals. The ordinary literals used in clauses in resolution will be B-literals in the model elimination procedure. The literal that is resolved on in the model elimination procedure is saved in the result as an A-literal. A-literals are used in instances where, in linear resolution, a clause is resolved with an ancestor clause.

There are two inference operations in model elimination: extension and reduction. Let Qm,.-.,Qi be a chain whose last literal Q1 is a B-literal. The literal indices are written in descending order to facilitate comparison with the Prolog inference rule stated previously. Model elimination consistently operates on the rightmost literal of a chain, while Prolog operates on the leftmost literal of its derived clauses. Let -~Q1 v P1 v -.- v P,~ be an input clause. Then the chain Qm,..., Q2, [Q1],PII, ... , P~. is the result of applying the model-elimination extension operation. In the derived chain, literals Qrn,..., Q2 are A-literals or B=literals according to their status in the parent chain; Q1 is an A-literal; Pil,..., Pi. are all B-literals with Q,...,i, being some permutation of 1,...,n. (A-literals will be marked by enclosing them in brackets.) Any permutation of P1,..., P- can be used in the result; it is unnecessary to derive additional chains with different permutations of these literats. Again, let Q,,~,..., Q1 be a chain whose last literal Q1 is a B-llteral. If Q1 is complementary to some earlier A-llteral Qi, then the chain Q,~,..., Q2 can be derived by the model-elimination reduction operation. In the derived chain, all the literals are A-literals or B-literats according to 93

their status in the parent chain.

If the clause -'Q1 v P1 v ... v P~ used in the extension operation is represented by the Prolog assertion Q1 *-- -~P1,---, -~P~ (this is possible precisely if -'Q1 is a positive literal and PI,..., P,~ are all negative literals, so that QI, ~PI,..., ~Pn are all atoms) and the permutation n,..., 1 is

used in forming the result of the extension operation, then the resulting chain's B-literals are

exactly the same literals, but in reverse order, as the literals in the result of a Prolog inference

operation. Because in Prolog all literals in a derived clause are negative, there can never be a

case of an A-literal being followed by a complementary B-literal. Thus, no reduction operations

are possible, and retaining the A-literals is unnecessary.

Some other aspects of the model elimination procedure need to be mentioned. Both the extension and reduction operations require the last literal of the chain to be a B-literal, but

extension by a unit clause results in a chain with a terminal A-literal. The solution to this

difficulty is that terminal A-literals are simply removed from the chain.

Certain chains can be rejected without loss of completeness. If the chain contains (a) an

A-literal followed later in the chain by an identical A-literal or B-literal, (b) an A-literal followed

later in the chain by a complementary A-literal, or (c) a B-literal followed later in the chain by

a complementary B-literal, where the two literals are not separated by an A-literal, then the

chain can be rejected. These tests can be performed on the chain before terminal A-literals are

removed. Tests (a) and (b), in particular, may reject a chain with terminal A-literals that would

be acceptable if the terminal A-literals are removed.

The rationale for Test (b) is that if the chain contains an A-literal followed by a complementary

A-literal, the second A-literal has a B-literal ancestor in an earlier chain in the derivation. This

literal could have been removed by reduction.

The rationale for Test (c) is that complementary B-literals unseparated by A-literals must come from the same input clause. This clause must then be a tautology, and it is unnecessary to use tautologous clauses.

Test (a) is more difficult to justify, but its effect is to eliminate loops. Rejecting chains on the basis of Test (a) precludes the refutation of a literal being a subtask of the refutation of that same literal.

Following is a model elimination proof of the unsatisfiability of {P V Q, P v -~Q, ~P v Q, ~P v ~Q}:

1. P, Q a chain from P v Q 2. P, [Q], P extend by P V -,Q 94.

a. P, [Q], [Pi,-Q extend by ~P v -~Q 4. P, [Q], [p] reduce -~Q by [Q] 5. P delete terminal A-literals [Q], [P] 6. [P],Q extend by -~P v Q ~,. IF], [Q], ~P extend by -,P v -.Q 8. [P], [Q] reduce -~P by [P] 9.0 delete terminal A-literals [Q], [P]

The model elimination procedure takes its name from the fact that it systematically tries to construct a model for a set of clauses. When all such attempts fail, the set of clauses is determined to be unsatisfiable. As the procedure attempts to construct a model, the A-literal [P] or [-~P] marks the assignment of of true or false to the atom P in the interpretation, respectively. In the above proof, the procedure tries to make each of the literals P and Q of the first chain an A-literal, because at least one of P and Q must be true in any model in order to satisfy the clause P V Q. Each assignment ultimately lead to a contradiction, so the set of clauses is unsatisfiable.

2.13 Prolog Technology Theorem Prover

Despite Prolog's logical deficiencies, it is quite interesting from a deduction standpoint because of its very high speed as compared with conventional deduction systems. The objective of a Protog technology theorem prover (PTTP) [80] is to remedy Prolog's deficiencies while retaining to the fullest extent possible the high performance of well-engineered Prolog systems. To achieve completeness for non-Horn clauses, an inference system other than Prolog's input resolution must be adopted. However, an arbitrarily chosen complete inference system is unlikely to be as efficiently implementable as Prolog. The fact that Prolog employs input resolution is crucial to its high performance. No Prolog operation acts on two derived clauses at once. The use of input resolution and depth-first search implies there is only one active derived clause at a time--represented on the stack--that is resolved with input clauses that can be compiled. The model elimination procedure is also an input procedure, but is complete. It can be seen as Prolog-style ordered input resolution plus one additional inference rule, the model-elimination reduction operation. The reduction operation, phrased in terms meaningful to Prolog, states that, if the current goal is complementary to an ancestor goal, then the current goal is treated as if it were solved (nonground goals may have to be unified for the rule to apply). It is a form of reasoning by contradiction. Consider proving C from A D C, B D C, and A V B. C has the subgoal A (by A D C), which has the subgoal -~B (by A V B), which has the subgoal -~C (by the contrapositive 95 of B D C). -~C is complementary to the higher goal C, so it can be treated as solved, as thus can -~B, A, and C. The reasoning is: the goal -~C is either true or false; if ~C is true, then C must be true by the chain of inferences--a contradiction because -,C and C cannot both be true; thus, ~C must be false and C must be true. Note that this reasoning says nothing about the value of the intermediate subgoals A and ~B. Another major concern is the incompleteness of Prolog's unbounded depth-first search strat- egy. It cannot be replaced by an arbitrary complete search strategy, like breadth-first or merit- ordered search, without sacrificing performance. If depth-first search were not used, it would be necessary for more than one derived clause to be simultaneously represented and for variables to have more than a single value simultaneously, i.e., different values in different clauses. This implies the need for a more complex and less efficient representation for variable bindings than Prolog's. In addition, depth-first search allows all state information to be kept on the stack with a minimum of memory required. Breadth-first search would need an additional amount of memory that would grow exponentially with increasing depth. Therefore, depth-first search continues to be a good choice of search strategy--but for com- pleteness, it must be bounded. That leaves the problem of selecting the depth bound. In an exponential search space, searching with a higher-than-necessary search bound can result in an enormous amount of wasted effort before the solution is found. The cost of searching level n in an exponential search space is generally large compared with the cost of searching earlier levels. This makes it a practical procedure to perform consecutively bounded depth-first search. The depth bound is set successively at 1, 2, 3, and so on, until a solution is found. If a constant branching factor b is assumed, this method results in only a factor of about ~ more inference operations being performed than breadth-first search to the same depth [82]. The effect is similar to performing breadth-first search. However, instead of retaining the re- sults from earlier levels, these results are recomputed--with the efficiency of Prolog-style variable- binding representation possible for depth-first search only. There are two important optimizations of this iteratively bounded depth-first search procedure that reduce the number of inference operations. The first optimization follows from the observation that, if the depth of the current goal plus the number of pending goals exceeds the depth bound, then no solution within the depth bound can be found from this clause and so another solution should be sought. The second optimization is concerned with (1) recording the minimum value by which the depth of the current goal plus the number of pending goals exceeds the depth bound, and (2) 96 using this minimum value to increment the depth bound instead of always incrementing it by 1. This technique can also be used to recognize that the search space is finite and that the search itself can be abandoned--if, upon the completion of searching at some level, no cutoff had ever occurred because the depth of the current goal plus the number of pending goals exceeded the depth bound, then it is clear that searching with a higher depth bound will result in no additional inferences being made. A Prolog technology theorem prover has several advantages as compared to ordinary theorem provers. It can perform inferences at a very high rate approaching Prolog's. It is complete and easy to use. Many conventional theorem provers rely upon user selection of features and parameter values that control behavior and limit completeness. A Prolog technology theorem prover, like Prolog, requires little memory and has facilities for procedural attachment (built-in functions) and control facilities (the ordering of literals in clause and of clauses in the database, the cut operation).

2.14 Connectlon-Graph Resolution

In principle, ordinary resolution operates on just a set of clauses as its only data structure. Conneetion-graph resolution [37~3], on the other hand, operates on a derived data structure called a eoaneetion graph. A connection graph is a graph containing clauses with links between com- plementary pairs of literals. The connection-graph-resolution operation resolves on a link in the connection graph, forms the resolvent, and adds it to the connection graph. Literals in the resolvent acquire their links to other literals in the connection graph by inheritance. They are linked only to those literals to which their parent literals were linked. An advantage of connection-graph resolution is its explicit representation by the links of what resolution operations are possible. This makes retrieval of matching literals easier and encourages graph searching as a method for selecting inference operations or finding proofs. Although the immediate access to matching literals via links is an often cited advantage of connection-graph resolution, it is also possible to achieve efficient excess to matching literals by using term indexes [61,23], at least for ordinary resolution. When the inference operations are more difficult to discover and computer as when theory resolution or unification in equational theories is used, inheriting links may be more efficient than rediscovering and recomputing possible inference operations. Besides suggesting encoding inference operations in a connection~graph data structure, con- 97 nection-graph resolution is a restriction of resolution because connection-graph resolution specifies that the link that is resolved on be deleted from the connection graph. The effect of this link deletion is that if literals L and L t are resolved, then L or a literal descended from L can never again be resolved with L t or a literal descended from L t. This has the beneficial effect of reducing the size of the search space in much the same way that ordering restrictions in input and linear resolution do. For example, ordinary resolution can discover two refutations of the set of clauses consisting of P, Q, and ~P Y -~Q: 1. P 2. Q 3. ~P v ~Q 4. -~Q resolve I and 3 5. false resolve 2 and 4 , and 1. P 2. Q 3. -~P v -~Q 4. -~P resolve 2 and 3 5. false resolve 1 and 4

Either of these refutations can be discovered by connection-graph resolution, but not both in the same execution of a connection-graph-resolution theorem prover. The only resolution operations possible initially are to resolve on P in clauses 1 and 3 or on Q in clauses 2 and 3. Suppose clauses 1 and 3 are resolved first° Then the link between P in clause 1 and -~P in clause 3 is deleted, and neither literal will have any links. Thus, if clauses 2 and 3 are later resolved, the resolvent -~P will have no links (because there were none to inherit) and a refutation cannot be completed. Note that, aside from the initial set of links, links are acquired only by inheritance. Thus, once a literal has no links, the clause containing it can be removed, because the linkless literal cannot be resolved away. This is an extension of the purity rule for ordinary resolution. A sometime characteristic of connectlon-graph resolution is the dramatic collapse of the con- nection graph when links are deleted. Deletion of a link may make a literal pure in the graph causing its clause to be deleted which leads to yet more pure literals.

The complexity of connection-graph resolution compared to other restrictions of resolution and its noncomrnutative behavior (i.e., inference operations cannot be freely reordered, because inferences can be blocked by absence of links depending on the order in which operations are performed) have made the procedure's completeness a difficult issue, although it can be shown complete under some restrictions [72,4,74]. 98

2.15 Nonclausal Resolution

One of the most widely criticized aspects of resolution theorem proving is its use of clause form. Besides generally being considered difficult to read and not human-oriented, one criticism of clause form is that conversion of a formula to clause form may eliminate pragmatically useful information encoded in the choice of logical connectives. For example, -,P y Q may suggest a case-analysis approach, while the logically equivalent P D Q may suggest a chaining approach to deduction. The use of clause form may also result in a large number of clauses being needed to represent a formula, as well as in substantial redundancy in the search space. An example of when conversion to clause formula results in a substantial increase in the size of the formula is the conversion of A -- B. If A and B are literals, the equivalent clause form is (-,A v B) h (A Y --B), which has two instances of the atoms of A and B. In the worst case, when A and B are formed using the equivalence connective, conversion of the single formula A - B may result in a number of clauses that is an exponential function of the size of the formula. Another problematical example is the formula (A1 A... A A,~) V (B1 A ... A B,). Even in the simple case when A1,... ,Am,B1,... ,B, are literals, if this formula is converted to the m x n clauses A1 V B1,..., Am V B,, each Ai occurs n times and each Bj occurs m times, instead of once as in the original formula. It is possible to extend the resolution rule to nonclausal formulas [60,55,78]. Although ordinary clausal resolution resolves on clauses containing complementary literals, nonelausal resolution resolves on general formulas containing subformulas occurring with opposite polarity. In clausal resolution, the literals resolved on are deleted and the remaining literals disjoined to form the resolvent. In nonclausal resolution, all occurrences of the subformulas resolved on are replaced by false (true) in the formula in which it occurs positively (negatively). The resulting formulas are disjoined and simplified by truth-functional reductions such as (Avtrue) --~ true and (A A true) -+ A that eliminate embedded occurrences of true and false and optionally perform simplifications such as (A A =A) --~ false. More precisely, if A and B are formulas and C is an atom occurring positively in A and negatively in B, then the result of simplifying A(C ~- false) v B(C *-- true), where X(Y +-- Z) denotes the result of replacing every occurrence of Y in X by Z, is a nonctausal resolvent of A and B. It is clear that nonclausal resolution reduces to clausal resolution when the formulas are 99 restricted to being clauses. In the general case, however, nonclausal resolution has some novel characteristics as compared with clausal resolution. It is possible to derive more than one resolvent from the same pair of formulas, even when resolving on the same atoms, if the atom occurs both positively and negatively in both formulas. Likewise, it is possible to resolve a formula with itself.

The elimination of clause form and use of nonclausal resolution has some disadvantages as well as advantages. Most operations on nonclausal formulas are more complex than the corresponding operations on clauses. The result of a nonclausal resolution operation is less predictable than the result of a clausal resolution operation. This is an important point when a theorem-proving system selects what operation to perform next on the basis of the expected result (e.g., how many literals are in a derived clause). Clauses can be easily represented as lists of literals; sublists are appended to form the resolvent. Pointers can be used to share lists of literals between parent and resolvent [6]. With simplification being performed during the formation of a noncIausal resolvent, the appearance of a resolvent may differ substantially from its parents, making structure sharing more difficult.

In clausal resolution, every literal in a clause must be resolved on for the clause to participate in a refutation. Thus, if a clause contains a literal that is pure (cannot be resolved with a literal in any other clause), the clause can be deleted. This is not the case for nonclausal resolution; not all atom occurrences are essential in the sense that they must be resolved on to participate in a refutation. For example, (P/~ Q,-~Q) is an unsatisfiable set of formulas, one of which contains the pure atom P. Only formulas containing pure atoms that are essential should be deleted for purity reasons. The subsumption operation must also be redefined for nonclausal resolution to take account of such facts as the subsumption of A by A A B as well as the clausal subsumption of A V B by A.

The nonclausal resolution procedure gains additional power and complexity from allowing resolution on nonatomic formulas as well as atoms. For example, P v Q and (P v Q) D R could be resolved to obtain R. This can result in shorter, more natural proofs. However, the extension to nonatomic formulas is difficult in some respects. It may be difficult to recognize complementary formulas. For example, P V Q occurs positively in Q v R v P, -~P D Q, and -~(P -- Q). Also, the effect of resolving on nonatomic subformulas can be achieved by multiple resolution operations on atoms. Resolution on atomic constituents of nonatomic formulas that are also resolved on can lead to redundant derivations and inefficiency. 100

2.16 Connection Method

The connection method [4,5] or generalized rantings [1] is not a form of resolution, but has some relationships to connection-graph resolution and nonclausal resolution, among others. Clause form is often referred to as conjunctive normal form (CNF) because it is a conjunction of disjunctions of literals. A dual form, called disjunctive normal form (DNF), is a disjunction of conjunctions of literals. One is easily obtained from the other by rewriting a formula in CNF by

(A A (B V C)) --* ((A A B) V (A A C)) ((B V C) A A) -~ ((S A A) V (C A A))

Another way of forming the DNF of a formula in CNF is to enumerate n conjunctions of literals, where n is the product of the number of literals in each clause and each conjunction is composed of one literal from each clause. For example, the CNF formula

(P V Q)A (P V ~Q)A (-~P v Q)A (~P V ~Q) is equivalent to the unsimplified DNF formula

(P A P A-~P A ~P)V (P ^ p A ~P ^ -~q)v (P A P A Q A -~P)V (P ^ P A 0 ^ ~q)v

(q A -,q ^ O A -,q)

The interesting thing about this formula is that every conjunction contains a complementary pair of literals. It is clear that this property holds for the DNF of any unsatisfiable formula. If a conjunction did not contain a complementary pair of literals, then that conjunction and, thus, the whole formula could be satisfied. This is the logical basis for the connection method. However~ it does not actually form the DNF of a formula. Instead, it does graph searching of the formula, enumerating its paths, where a path consists of one literal from each clause. If every path contains a complementary pair of literals, then the formula is unsatisfiable. Because a single complementary pair of literals often appears in more than one path, it is possible by clever search to avoid explicit enumeration of all 101 of the paths. A connection graph is often used as an auxiliary data structure in the connection method. The connection method is applicable to formulas that are not in clause form. It is only necessary to refine the definition of a path through the formula. Consider the case of formulas that are in negation normal form (NNF). A formula is in NNF if its only connectives are conjunction, disjunction, and negation, and only atomic formulas are arguments of negation. (The connection method applies to formulas more general than NNF, but NNF is especially convenient to discuss, because the restriction of negation to atomic subformula means that, for example, a conjunction is really a conjunction, not a disjunction in disguise because it is negated.) A path through a formula that is a single literal consists of that single literal. Any path through one of the disjuncts is a path through a disjunction. Any concatenation of paths through all of the conjuncts is a path through a conjunction.

For example, the formula

PV A

(R v -,S) has the paths (P), (Q,R), and (Q,~S). The principal strategic concerns for the connection method applied to ground formulas are the efficient enumeration of pairs of complementary literals and paths so that not all paths need to be individually checked and the reduction of formulas to equivalent ones. Many reduction methods are similar to methods used in resolution, such as elimination of tautologies, subsumption, and purity. For formulas that are not ground there is the additional strategic concern of how many in- stances of subformulas will be needed for a single substitution to exist such that every path contains a complementary pair of literals. For example, in refuting ~P(x) A (P(a) V P(b)), two instances-~P(a) and ~P(b) of ~P(x)are required. Bibel's chapter also includes discussion of the connection method.

2.17 Theory Resolution

Theory resolution [79,81] is a method of incorporating specialized reasoning procedures in a reso- lution theorem prover so that the reasoning task will be effectivelydivided into two parts: special cases, such as reasoning about inequalitiesor about taxonomic information, are handled efficiently 102 by specialized reasoning procedures, while more general reasoning is handled by resolution. The connection between the two reasoning components is made by having the resolution procedure resolve on sets of literals whose conjunction is determined to be unsatisfiable by the specialized reasoning procedure. The objective of research on theory resolution is the conceptual design of de- duction systems that combine deductive specialists within the common framework of a resolution theorem prover. Past criticisms of resolution can often be characterized by their pejorative use of the terms uniform and syntactic. Theory resolution meets these objections head on. In theory resolution, a specialized reasoning procedure may be substituted for ordinary syntactic unification to deter- mine unsatisfiability of sets of literals. Because the implementation of this specialized reasoning procedure is unspecified--to the theorem prover it is a "black box" with prescribed behavior, namely, able to determine unsatisfiability in the theory it implements--the resulting system is nonuniform because reasoning within the theory is performed by the specialized reasoning pro- cedure; reasoning outside the theory is performed by resolution. Theory resolution can also be regarded as being not wholly syntactic~ because the conditions for resolving on a set of literals are no longer based on their being made syntactically identical, but rather on their being unsatisfiable in a theory, and thus resolvability is partly semantic. Reasoning about orderings and other transitive relations is often necessary, but using ordinary resolution for this is quite inefficient. It is possible to derive an infinite number of consequences from (a < b) and -~(x < y) V ~(y < z) v (x < z) despite the obvious fact that a refutation based on just these two formulas is impossible. A solution to this problem is to require that use of the transitivity axiom be restricted to occasions when either there are matches for two of its literals (partial theory resolution) or a complete refutation of the ordering part of the clauses can be found (total theory resolution). An important form of reasoning in artificial-intelligence applications embodied in knowledge- representation systems is reasoning about taxonomic information and property inheritance. One of objectives of theory resolution is to be able to take advantage of the efficient reasoning provided by a knowledge representation system by using it as a taxonomy decision procedure in a larger deduction system. For systems like the Krypton knowledge representation system, which com- prises terminological and assertional reasoning components, theory resolution provides a theory for connecting different reasoning systems. Any satisfiable set of formulas that is to be incorporated into the inference process can be regarded as a theory. A T-interpretatlon is an interpretation that satisfies theory T. 103

For example, in a theory of partial ordering ORD consisting of -~(x < x) and (x < y) A (y < z) D (x < z), the predicate < cannot be interpreted so that (a < a) has value true or (a < e) has value fa/se if (a < b) and (b < c) both have value true. In a taxonomic theory TAX including Boy(x) D Person(x), Boy(John) cannot have value true while Person(John) has value false. A set of clauses S is T-unsatisfiable if and only if no T-interpretation satisfies S. Let Ci ..... C,~ (rn > 1) be a set of nonempty clauses, let each Ci be decomposed as K~ v L~ where Ki is a nonempty clause, and let R1,..., Rn (n _> 0) be unit clauses. Suppose the set of clauses K1,. • •, K,~, R1,. •., Rn is T-unsatisfiable. Then the clause Li V.-- V L,~ V ~R1 V... v -~R, is a theory resolvent using theory T (T-resolvent) of Ci,..., Cm. It is a total theory resolvent if and only if n = 0; otherwise it is partial. K1,... ,Kin is called the key of the theory resolution operation. For partial theory resolvents, R1,..., Rn is a set of conditions for the T-unsatisfiability of the key. The negation -~R1 V... V -~R, of the conjunction of the conditions is called the residue of the theory resolution operation. It is a narrow theory resolvent if and only if each Ki is a unit clause; otherwise it is wide. The partial theory resolution procedure permits total as well as partial theory resolution operations. Similarly, the wide theory resolution procedure permits narrow as well as wide theory resolution operations. For example, a set of unit clauses is unsatisfiable in the theory of partial ordering ORD if and only if it contains a chain of inequalities tl < ... < t,(n > 2) such that either tl is the same as tn or ~(tl < t,) is also one of the clauses. P is a unary total narrow ORD-resolvent of (a < a) V P. P V Q is a binary total narrow ORD-resolvent of (a < b) v P and (b < a) V Q. P v Q v R V S is a 4-ary total narrow ORD-resolvent of (a < b) v P, (b < c) v Q, (c < d) v R, and -~(a < d) V S. This can also be derived incrementally through partial narrow ORD-resolution, i.e., by resolving (a < b) V P and (b < e) V Q to obtain (a < e) V P V Q (-~(a < e) is the condition), resolving that with (e < d) V R to obtain (a < d) V P V Q V R, and resolving that with -~(a < d) V S to obtain PVQVRVS. Suppose the taxonomic theory TAX includes a definition for fatherhood Father(x) -- [Man(x) /x 3yChild(x,y)]. Then Father(Fred) is a partial wide theory resolvent of Child(Fred, Pat) V Child(Fred, Sandy) and Man(Fred). Also, false is a total wide theory resolvent of Child(Fred, Pat) v Child(Fred, Sandy), Man(Fred), and -~Father(Fred). In narrow theory resolution, only T-unsatisfiability of sets of literals, not clauses, must be decided. Total and partial narrow theory resolution are both possible. In total narrow theory resolution, the literals resolved on (the key) must be T-unsatisfiable. In partial narrow theory 104 resolution, the key must be T-unsatisfiable only under some conditions. The negated conditions are used as the residue in the formation of the resolvent. The theory rantings procedure is another method of incorporating theories that is similar to the total narrow theory resolution method, in the sense of imposing the same requirements on the decision procedure for T, i.e., determining the T-unsatisfiability of sets of literals but does not depend on performing resolution inference operations. The theory matings procedure is an extension of the connection method or generalized mat- ings. The statement that if every path through a formula contains a complementary pair of literals, then the formula is unsatisfiable can be generalized to the statement that if every path through a formula contains a set of literals that is unsatisfiable in the theory T, then the formula is T-unsatisfiable. Theory resolution is a procedure with substantial generality and power. Thus, it is not surpris- ing that many specialized reasoning procedures can be viewed as instances of theory resolution, perhaps with additional constraints governing which theory resolvents can be inferred.

For example, unification in equational theories can be viewed as a special case of theory resolution for building in equational theories. Inference rules such as paramodulation, resolution by unification and equality, and E-resolution can also be viewed as instances of theory resolution, differing in whether total or partial theory resolution is used and in their selection of key sets of literals to resolve on.

2.18 Krypton

Krypton [7,8] represents an approach to constructing a knowledge representation system that is composed of two parts: a terminological component (the TBox) that can represent and reason about terminological information and an assertional component (the ABox) that can represent and reason about assertional information. It is an interesting example of the application of theory

resolution. Krypton's TBox provides a language for defining and reasoning about taxonomic relations. It permits definitions of concepts and roles that are associated with unary and binary predicates

in the ABox. A concept can be defined as a primitive concept, a conjunction of concepts, or a concept

restricted so that all fillers of a particular role are of a certain concept. A role can be defined to be a primitive role or a composition of roles. For example, the following are valid Krypton TBox definitions: 105

def Grandchild =_ (RoleChain Child Child) def Coed - (ConGenermWoman Student) Successful-Grandma de_f (VRGeneric Woman Grandchild Doctor)

That is, Grandchild(x, y) is true if and only if y is a child of a child of x, Coed(x) is true if and only if x is someone that is both a woman and a student, and Successful-Grandma(x) is true if and only if x is a woman all of whose children (if any) are doctors.

Krypton's ABox is a resolution theorem prover [78] that uses predicates that have been given TBox definitions. Taxonomic definitions are not provided as assertions to the ABox however. Instead, they are taken account of by theory resolution operations that use the theory of the defined concepts and roles. Thus, for example, all of the following inferences are single-step theory-resolution opera- tions performable by the ABox: from Student(John) and -~Coed(John) it is possible to infer -~Woman(John); Coed(John) and ~Woman(John) are directly contradictory; from Successful- Grandma(Marge) and Child(Marge, Hope) it is possible to infer (Child(Hope, x) D Doctor(x)).

3 Unification

Unification is a bidirectional pattern matching process, i.e., it is like pattern matching except values can be assigned to variables in both expressions, not just one. For example, though neither of P(a, x) and P(y, b) is a pattern-matching instance of the other, they are unifiable with most general unifier {x ~-- b, y +-- a}. In general, a substitution is a set of variable assignments. It is convenient to consider only idempotent substitutions [19], where an idempotent substitution is one in which no variable x~ that appears in an assignment x~ ~-- tl also occurs inside the term tj for any assignment xj ~-- tj in the substitution.

The standard unification algorithm scans the two expressions to be unified in left-to-right order, looking for the first disagreement or difference between the two expressions. If one of the two subexpressions located at the first disagreement is a variable and the other is an expression not containing that variable, then the assignment of the subexpression to the variable is added to the substitution being constructed, the two expressions are instantiated by the new assignment, and the process continues. If neither of the two expressions is a variable, or if one is a variable and the other subexpression contains that variable, then unification of the two expressions fails. Unification succeeds if no (further) disagreements are found, i.e., the two original expressions have been instantiated to be identical. 106

The check for whether the variable is contained in the expression that is perhaps to be assigned to it is call the occurs cheek. The occurs check causes the unification of x and f(x) (or any other term containing x other than x itself) to fail. If the unification were allowed to succeed, it would result in the formation of a circular binding x +- f(x), and the unified expressions would be infinite. A demonstration of the importance of the occurs check for sound inference is the following Prolog program [64] (many Prolog implementations, for the sake of efficiency~ do not perform the occurs check):

(1) X < s(X). (2) 3 < 2 :- s(Y) < Y. (3) ?- 3 < 2.

Restated in English~ the foregoing asserts that says (I) every x is less than the successor of x, and that (2) if, for every y, the successor of y is less than y, then 3 would be less than 2; it then asks (3) whether 3 is less than 2. Prolog implementations without the occurs check would answer affirmatively, binding X to s(Y) and Y to s(X), thereby creating an infinite term. Moreover, unification without the occurs check may not even terminate, as in the case of unifying the values of X and Y.

The unification algorithm either succeeds and returns a single unifying substitution unifier or fails and returns none. If it succeeds, the unifier returned is a most general unifier. A most general unifier is one such that any other unifier is a variant or instance of it. The most general unifier is not necessarily unique, however. For example, both {x +- y} and {y +-- x} are most general unifiers of x and y. As given here, unification can be quite inefficient. In the worst case, its behavior is exponential.

For example, consider the unification of f(u, h(w, w), w, j(y, y)) and f(g(v, v), v, i(x, z), x). This would result in successive assignments of u +-- g ( v , v ) , v +- h ( w , w ) , w +-- i(z, x), and x +- j ( y , y ) .

The resulting substitution is {x +-- j(y,y),w +-- i(j(y,y),j(y,y)),v +-- h(i(...),i(...)),u ~- g(h(...), h(..-))}. The algorithm would incrementally construct this substitution, instantiating the current substitution by each new variable assignment as it is made, and would also create new instances of the original expressions. There is a linear-time unification algorithm by Paterson and Wegman [62] that requires a directed acyclic graph representation for expressions, and there are also efficient unification algo- rithms by Huet [26] and Martelli and Montanari [57]. The costliest inefficiency of the standard unification algorithm is its need to instantiate the 107 expressions being unified and the substitution being constructed by the newest variable assign- ment. This can, as in the example, produce exponential growth in the size of the substitution and the terms being unified. A solution is to use a structure-sharing method [6] during unification. As the two expression are being scanned for disagreements, if a variable is encountered, its value is looked up in the list of bindings accumulated so far and used in the scanning process (but not substituted into the expression). Variables are also looked up when the occur check is applied. This process eliminates actual formation of the instantiated terms though they are implicitly created during the scanning and occur check processes. If the process is completed successfully with no uneliminatable disagreements, the result is a set of noncircular variable bindings that may depend on each other. The set of bindings should be converted to an idempotent substitution for it to be used efficiently, e.g., to instantiate the remaining literals of a pair of clauses being resolved° To convert a set of dependent noncircular bindings to an idempotent substitution, topologically sort them so that the binding x~ +- ti precedes the binding x1 +- t~. if x~ occurs in tj (an inability to topologically sort the bindings implies an occur check violation). Let (xl ~-- tl,..., x• +-- t,~) be a topologically sorted list of noncircular bindings. Then let ~1 be (xl +-- tl} and 0~ be 0~-1 U {x~ +- t~0~_l) (1 < i < n). Each 0~ is an idempotent substitution with 8, being the final result. A more abstract characterization of substitutions and unification can be found in the chapter by Huet.

3.1 Unification in Equational Theories

Some equational theories occur in theorem-proving applications often enough and have enough impact on overall performance to merit their incorporation directly into unification algorithms [65]. The most pervasively used properties that have been built into unification algorithms are associativity, commutativity, and their combination. If the equality relation is used to represent associativity and commutativity, then associativity and commutativity of the function f can be expressed as f(f(x, y), z) -- f(x, f(y, z)) and f(x, y) = f(y,x). Because of the difficulty of using the equality predicate, whether by axiom or special rules of inference like paramodulation, an alternate formulation has often been used. Let P(x, y, z) denote f(x,y) -= z. Then associativity can be represented by the pair of clauses -~P(x,y,u) v 108

-~P(u,~,~,) v --,P(,,, ~, ~) v e(=,,,,~) and -~P(~, y,,~) v -~P(y, ~,v) v -~P(=,v,~) v v(,~,~,~,), =d commutativity can be represented by the clause -~P(x, y, z) v P(y, x, z). There are several difficulties in specifying associativity and commutativity axiomatically re- gardless of which representation is used. One is that there are too many representations for the same expression. For example, the expressions f(a,f(b,e)), f(a,f(c,b)), f(b,f(a,c)), f(f(a,c),b), etc., are all equivalent if f is associative and commutative. These multiple representations for equivalent expressions contribute to excessive search-space sizes. Subsumption will not detect and remove formulas that are associative-commutative vari- ants. In addition, verifying that two expressions are associative-commutative variants often in- volves lengthy deductions. For example, to derive f(f(e, a), b) from f(a, f(b, c)) requires two uses of the commutativity axiom and one of the associativity axiom. This approach requires three paramodulation steps. Even more steps would be necessary if equality axioms or the nonequality formulation were used instead of paramodulation. Theorem provers will often fail to solve difficult problems that involve functions that are associative or commutative or both, because so much effort must be spent on deductions that should be trivial. Another problem with the axiomatic representation for associativity and commutativity is that the theorem prover will be undiscriminatingin what results are derived. If f(a, x) and f(y, b) need to be unified, where f is associative and commutative, an associative-commutative unification algorithm may recognize that a complete set of most general unifiers consists of {x ~-- b, y *- a} and {x ~-- f(b,z),y *-- f(a,z)}. However, if axioms for associativity and commutativity are used, less general unifiers like {x ~-- f(b, f(zx,z2)),y *-- f(a, f(z,,z2))} and their associative- commutative variants will also be generated, ad infinitum. Building properties like associativity and commutativity into the unification algorithm elim- inates these difficulties and the need for associativity and commutativity axioms. If equality axioms or inference rules are needed only to support inference about these properties, then the use of equality axioms or inference rules can be eliminated as well. Despite the complexity of special unification algorithms for equational theories and the fact that they are generally much more time consuming than ordinary unification, their use generally pays off when trying to prove nontrivial theorems. When compared with formulating properties such as associativity and commutativity as axioms, special unification is advantageous because it does not return any results that are not implicit in the search space using the axioms--it just computes them more directly--and they often will compute a finite complete set of unifiers, while 109 the axiomatic approach would continue to generate redundant consequences.

If the unification problem is intractable (associative-commutative pattern matching is NP- complete [2]--associative~commutative unification is thus at least that difficult), this difficulty will also be reflected in the number of inferences required in trying to prove theorems with the axioms without using special unification. The most serious problem with using special unification algorithms is the possible occurrence of difficult unification tasks early in a search for a proof that effectively blocks the discovery of a shallow proof elsewhere, because of the resources spent on special unification. In such cases, incomplete or incremental special unification algorithms can be employed. Ide- ally, an incomplete special unification algorithm wilt return only some of the simpler unifiers and an incremental one wilt return progressively more complex unifiers on successive calls. Actu- ally, the use of axioms for the theory of, say associativity or commutativity, along with axioms or rules for equality plus ordinary unification in effect form a quite inefficient incremental unification algorithm. Many results on special unification can be found in Siekmann [71].

3.2 Commutative Unification

The standard unification algorithm can be easily modified to build in commutativity of functions [73,70]. For example, if f is a commutative function, when unifying the terms f(s~,sz) and f(Q,t2), it is necessary to try to unify the arguments sl with tl and s2 with t2 simultaneously, as in the ordinary unification algorithm, and also to try to unify sl with t2 and s2 with tl simultaneously. This modification of the ordinary unification algorithm yields an algorithm that is complete for commutative functions. The commutative unification algorithm given here illustrates two properties of special uni- fication algorithms. One is their added complexity and computational requirements compared to ordinary unification. The second is that, depending on the theory that is incorporated into the unification algorithm, it may no longer be the case that there will be only zero or one most general unifiers. For example, if f(x, y) and f(a, b) are unified, where f is commutative, then {x ~ a, y *- b} and {x ~ b, y ~ a} are both most general unifiers; the two together compose the complete set of most general unifiers. 110

3.3 Associative Unification

Associative unification is more difficult than commutative unification. If the function f is associa- tive (but not commutative), then the terms f(a, x) and f(x, a) have an infinite set of most general unifiers, namely {x ~-- a), (x e- f(a, a)}, (x ¢-- f(a, f(a, a))}, and so on. Two other interesting examples, syntactically similar to the first, are the unification of f(a, x) and f(y, b), which have a complete set of most general unifiers consisting of (x ~- b, y ~-- a) and {x +- f(z, b), y ~-- f(a, z)), and the unification of f(a, x) and f(x, b), which have no unifiers. For functions that are associative, it is convenient to drop the distinction between f(x, f(y, z)) and f(f(x,y),z) and represent both by the term f(x,y,z) as if f were an n-ary function for arbitrary n. A complete unification algorithm for associative functions is readily obtained by modifying the standard unification algorithm in the following manner [65,73,47]. Argument lists of two terms headed by the same associative function symbol are scanned in left-to-right order, looking for the first disagreement. If the first disagreement is that one argument list is exhausted before the other, then unification with the current substitution fails. The principal difference from standard unification occurs when the two subexpressions at the first disagreement are arguments of an associative function and one or both of the subexpressions is a variable. Let variable z and term t be such arguments of an associative function f. If t contains x, then unification with the current substitution fails. If t does not contain x, then unification proceeds with the substitution of t for x and also with the substitution of f(t, u) for x, where u is a new variable not occurring elsewhere. If t were also a variable, then it would also be necessary to try the substitution of f(x, v) for t, where v is a new variable. Consider the unification of f(x, y) and f(a, b, c). The first disagreement is x differing from a. Thus, the assignments x +- a and x ~-- f(a, u) are tried, leading to the problems of unifying f(a, y) and f(a, u, y) with f(a, b, c), respectively. The first problem is solved by the subsequent assignments y +- f(b,v) and v e- b, and the final unifier is (x ~- a,y +- f(b,c)}. The second problem is solved by the subsequent assignments u ~-- b and y ~- c, and the final unifier is {x ~-- f(a, b), y *- c}. The complete set of unifiers consists of {x ~- a, y +- f(b, c)} and (x ~-- /(a, b), y ~- e}. Although this algorithm is complete, it does not always terminate, because, as in the case of unifying f(a, x) and f(x, a), there may be an infinite number of unifiers. Even where there is a finite number of unifiers, as in the case of unifying f(a,x) and f(x,b) that have no unifiers, 111 the algorithm fails to terminate, trying to match the expressions with a~signments z ~- a, x ~- f(a, vx), x *- f(a,a, v2), and so on. An alternative approach to associative unification that allows better control separates the tasks of assigning terms to variables and creating new variables. This approach uses an incomplete associative unification algorithm that introduces no new variables. It just assigns to a variable one or more of the arguments in the other argument list. For example, in the case of unifying f(x, y) and f(a, b, c), the algorithm may try to assign to x the terms a, f(a, b), and f(a, b, e). Continuing to unify the expressions with each of these assignments results in the unifiers {x ~ a, y ~-- f(b, e)}, {x *-- f(a, b), y ~-- c}, and failure, respectively. This incomplete unification algorithm is combined with a widening [73] or variable splitting [76] process that replaces variables by more complex terms. If the variable x is an argument of the associative function symbol f, then the term containing x can be widened by replacing x by the term f(xl,x2) with new variables xl and x2. A complete associative unification algorithm is obtained by collecting the results of unifying, by the incomplete but terminating associative unification algorithm, one expression with all results of widening the other expression. The widening operation may be applied any number of times. In order to compute the infinite number of unifiers of f(a, x) and f(x, a), an infinite number of widening substitutions must be applied. Note that it is sufficient to create widening substitutions for only one of the two expressions. For example, in unifying f(a, x) and f(y, b), the incomplete unification algorithm returns the substitution {x ~-- b, y ~- a). Widening f(a, x) with the assignment x +-- f(xl, x2) results in the unification of f(a, x2, x2) and f(y, b) with unifier {x +- f(x2, b), y ~-- f(a, x2)}. The completeness of the above incomplete associative unification algorithm in conjunction with widening only one of the two expressions implies the completeness of the incomplete associative unification algorithm for pattern matching. Makanin [54] proved the decidability of associative unification for the restricted case of terms composed of a single associative function symbol and variables and constants only. However, this algorithm only decided whether a unification problem was solvable--it did not return a unifier, let alone a complete set of unifiers. More recently, Jaffar [34] has developed a minimal and complete unification algorithm for this case. This algorithm computes a minimal complete set of unifiers and, unlike the algorithm described above, is guaranteed to terminate if the complete set is finite. However, this algorithm has not yet been generalized to handle nonvariabte, nonconstant arguments, as is required for general use in theorem proving. 112

3,4 Associative-Commutative Unification

It is possible to develop associative-commutative unification algorithms aIong the lines of the complete but nonterminating associative unification algorithm and the incomplete associative unification algorithm augmented by widening [76]. However, we can do better. In the case of associativity plus commutativity, there is a finite number of unifiers, and it is possible to devise a complete terminating unification algorithm [75~77,48]. Arguments common to the two terms headed by the same associative-commutative function symbol can be canceled in pairs until no arguments appear in both terms. Thus, for example, the problem of unifying f(x, x, y, a, b, c) and f(b, b, b, c, z) can be replaced by the problem of unifying f(x, x, y, a) and f(b, b, z). The case of unification of terms headed by an associative-commutative function symbol with only variable arguments will be considered first. For example, consider unifying the terms f(x,x,y,u) and f(v,v,z) where f is an associative-commutative function. What is required of a substitution for variables u, v, x, y, and z for it to be a unifier? Each variable is assigned either a term not headed by the function symbol f or a term headed by f with some arguments. Consider each distinct term t that is either a variable value not headed by the function symbol f or variable-value argument of a term headed by f. For a substitution to be a unifier, for every such term t, twice the number of t's in x plus the number of t's in y plus the number of t's in u must equal twice the number of t's in v plus the number of t's in z. Thus, unification of the terms f(x, x, y, u) and f(v, v, z) is related to solution of the linear homogeneous diophantine equation 2x+y+u=2v+z. In contrast to the usual situation of trying to solve linear homogeneous diophantine equations, associative-commutative unification requires that only nonnegative integral solutions be consid- ered. A negative value for a variable corresponds to assigning a negative number of terms to a variable in the unification problem. Negative values are considered in extensions of this method to abelian-group-theory unification (associativity plus commutativity plus identity plus inverse); the presence of the inverse operation makes it meaningful to consider the assignment of a negative number of terms to a variable. The set of all nonnegative integral solutions to a linear homogeneous equation can be obtained by addition of elements of a finite basis set of solutions. This finite basis set of solutions is obtained by generating all solutions to the equation in ascending order of the value of 2x + y + u (= 2v + z), discarding solutions that are composable from those previously generated, and terminating when 113 no new noncomposable solutions can be found.

It is necessary to discover some bound on the value of the equation such that no new basis solutions will be found with value higher than the bound. Consider the general problem of finding solutions to the linear homogeneous diophantine equation azxl + "" + a,~x,~ = blyl +

• ".+bny,. For each i and j with 1 < i < m and 1 < j N n, there is a basis solution with xi = lcm(a,,bj)/ai, Y1 = lcm(ai,bi)/bi, and all other variables equal to zero. One of these solutions must be subtractable with nonnegative difference from any solution with value greater than max(m, n) x max~,~-lcm(a~, bj) and this is, therefore, a bound on the value of solutions. A lower bound and more effective enumeration method can be found in Huet [27]. The 7 basis solutions for the equation 2x + y + u = 2v + z are given by the table

xyuvz I00101zl 201001z2 300210zs 401110~ 502010z5 6100024 710010~

Thus, any nonnegative integral solution to the equation can be obtained by assigning nonnegative integers to the variables zl,. • •, z7 and computing

x= z6+z7

y = z2 + z4 + 2z5

u = zl + 2z3 + z4

V = Zz + Z4 + Zh + Z7

z = zl + z2 + 2z6

The corresponding substitution {x ~-- f(z¢~ zT), y ~-- f(z2, z4, zh, zs),u ~-- f(zl, z3, zz,z4),v ~-- f(zz, z4, zh, zr), z ~ f(zl, z2, z¢, z6)} is the single most general unifier if f has an identity element. Associative-commutative unification without identity is slightly more complicated. Because without an identity element it is impossible to assign zero terms to some variable z~, it is necessary to consider the 2" combinations of the n basis solutions, restricted to those such that none of the variables x, y, u, v, or z is assigned zero. There are 69 such solutions, including (denoting a solution by the set of its indices) {2, 3, 6}, {1, 2, 3, 6}, and {4, 6} with corresponding unifying substitutions 1t4

{x ~-- z6, y ~'- z~,u +-- f(z3,z3),v +-- z3, z .e-- f(z~,z6, z~)}

{~ +- ~,y ~- ~2,~ ~-/(zl,~3,~),v ~- z3,~ +-/(~1,z~,~,~,)}

{~ ~- z~,y ~- ~,,~ +- ~,,v ~- z,,~ ~-/(z~, z,)}

This set of 69 unifiers is a minimal complete set of unifiers of f(x, x, y, u) and f(v, v, z). Associative-commutative unification for more general terms is accomplished by first forming a variable abstraction of the terms. For example, in unifying f(x, x, y, a) and f(b, b, z), variable only terms f(x, x, y, u) and f(v, v, z) are formed by replacing the distinct nonvariable terms a and b by new variables u and v. The original terms can be obtained from their variable abstraction by applying the substitution {u ~-- a, v ~-- b}. The variable only terms are unified as above. Each unifier of the variable only terms is then unified with {u ~-- a, v ,--- b} [83]. The resulting substitutions are a complete set of unifiers for the original terms. As stated so far~ this would seem to entail the unification of each of the 69 unifiers of f(x,x,y,u) and f(v,v,z) with the substitution {u +-- a,v +- b}. However, substantially less effort than this is required [75,77]. The generation of the sets of basis solutions can be con- strained to take account of the origins of the variables. In particular, the variables u and v of the variable abstraction correspond to the constants a and b in the original terms. Each assignment to a variable x, y, u, v, or z in a unifier of f(x,x,y,u) and f(v,v,z) is either a variable z; or a term f(.. o). Any unifier that assigns u or v a term of the form f(..-) will not be unifiable with the substitution {u ~- a, v ~ b}. When computing sums of basis solutions, any variable (e.g., u and v) that comes from a nonvariable term in the original problem must be assigned exactly one. Only 6 unifiers of f(x, x, y, u) and f(v, v, z) are discovered when this restriction is imposed. The number can be reduced to 4 by observing the restriction that the use of basis-solution number 4 requires the unification of a and b. The constrained generation of basis sums and unifiers for f(x,x, y,u) and f(v,v, z) yields sums {1, 5, 6}, {1, 2, 5, 6}, {1, 2, 7}, and {1, 2, 6, 7} with corresponding unifiers:

{x ~-- z6, y ~ f(zb,zs),u ~ zl,v ~-- zb,z ~-- f(zl,z6, z6)}

{= ~- ~0,y ~-/(z~,~,~),~ ~- ~,~ ~- ~,~ ~- f(z~,~,~,~)}

{~ ~- ~,y ,- ~2,~ ~- ~,,v ~- ~,,~ +- f(~,~)}

{~ ~- f(z~,~,),y ~- ~,~ ~- ~,,. ~- ~,,z ~- f(~,z~,~,z~)}

Unification of these with the substitution {u ~-- a,v +-- b} yields the complete set of unifiers of f(x,x,y,a) and f(b,b,z):

{u ~- f(b,b),~ ~-/(a,x,~)} 1t5

Termination of associative-commutative unification was an open question for a long time, but has now been solved [20]. Termination of standard unification is easy to verify because as each pair of symbols is matched during the scan from left to right for disagreements, either the remaining number of symbols is fewer (when the matched symbols agree) or the number of uninstantiated variables is fewer (when a disagreement is eliminated by assigning a term to a variable). Such a simple termination criterion does not exist in the case of associative-commutative unification because associative-commutative unification can introduce additional variables. It is necessary to show that the recursive calls on the unification algorithm operate on pairs of terms having less complexity than the original terms. For associative-commutative unification, a complexity measure that can be used to prove termination is the ordered pair (~, r) where ~ is the number of variables that occur as arguments to two different associative-commutative function symbols and r is the number of distinct nonvariable subterms that appear in the two terms being unified. A unification problem described by (~, r) is less complex than one described by (#, r') if and only if~<~t, orv=# andr

The variable and constants only case of abelian-group-theory unification (associativity, eom- mutativity, identity, and inverse) can be handled by a modification of this method that uses the standard solution of the linear diophantine equations in all integers, not just nonnegative ones [46].

3.5 Many-Sorted Unification

Many-sorted unification [84,86] can be used to reason efficiently with sort information. The universe of discourse is assumed to be divided into objects of different sorts. Constants, functions, and variables may be declared to have particular sorts and subsort relationships may be declared among sorts. 116

The types of sort information that can be handled by many-sorted unific&tion includes asser- tions of the form

Man(John)--John is a man

Woman(Mary)--Mary is a woman

Man(/ather(x))--the father of x is a man

Man(x) ~ Per~on(x)--every man is a person

Woman(x) D Person(x)--every woman is a person.

These assertions are supplanted by sort declarations:

Man, Woman, and Person are sorts

The constant John is of sort Man

The constant Mary is of sort Woman The function father is of sort Man

The sort Man is a subsort of the sort Person

The sort Woman is a subsort of the sort Person.

Many-sorted unification uses such declarations to restrict the standard unification algorithm.

Whenever the unification algorithm eliminates a disagreement between two expressions by as- signing a term to a variable, the many-sorted unification restriction cheeks for conformability of the sorts of the variable and the term.

Two cases of many-sorted unification will be distinguished. In the first case, the sort hierarchy is a forest, i.e., a set of trees. No sort C is a subsort of both A and B (unless A is a subsort of B or B is a subsort of A). The second more general case permits common subsorts and allows sort hierarchies that axe graphs. In both cases, a nonvaxiable term can be assigned to a variable only if the nonvaxiable term's sort is the same as or is a subsort of the variable's sort.

The other situation in which a disagreement can be successfully eliminated is when the dis-

agreement consists of two distinct variables. In the forest sort hierarchy case, if the variables are of the same sort, either can be assigned

to the other. If one's sort is a subsort of the other's, the former variable must be assigned to

the latter variable. Thus, in unifying variables x and y, one cannot, as in standard unification,

uniformly make the assignment x ~- y. We must instead make the assignment y *- x if x's sort is

a subsort of y's. If neither variable's sort is a subsort of the other's, then unification simply fails.

For example, if x is a variable of sort Person and y a variable of sort Man, then John and y are t17 unifiable with unifier {y +- John), x and y are unifiable with unifier {x +- y} (but not {y *- x}), and Mary and y are not unifiable. Note that, by use of a technique familiar in logic programming [161, if the sort hierarchy is a forest, many-sorted unification can be simulated by encoding sort information directly in the terms. In this technique, there is a unary function symbol associated with each sort.

Sorted terms are embedded in a sequence of such unary function symbols corresponding to the sequence of sorts from the top of the sort hierarchy to the declared sort of the term. Thus, the man John and the woman Mary are represented by the terms person(man(John)) and person(woman(Mary)), respectively. The arbitrary person x and man y are represented by the terms person(x) and person(mart(y)) respectively. For example, similarly to above, person(man(John)) and person(man(y)) are unifiable with unifier {y ~-- John), person(x) and person(man(y)) are unifiable with unifier {x *-- man(y)), and person(woman(Mary)) and person(man(y)) are not unifiable.

When unifying two variables in the graph sort hierarchy case, if the variables are not of the same sort and neither variable's sort is a subsort of the other's, the variables are still unifiable provided their sorts have one or more subsorts in common. For each common subsort, a new variable of that sort is created and assigned to both variables being unified. It is sufficient to consider maximal common subsorts, e.g., if $1 and $2 are the two common subsorts of the sorts of the variables x and y being unified, but $2 is a subsort of $1, then only one unifier need be formed--with a new variable z of sort $I being assigned to both x and y.

For example, assume the declarations

Animal, Mammal, Lion, Dog, Cat, Fish, Shark, Koi, and Pet are sorts Mammal, Fish, and Pet are subsorts of Animal Lion, Dog, and Cat are subsorts of Mammal Shark and Kol are subsorts of Fish Dog, Cat, and Koi are subsorts of Pet.

Let xF~sh denote the variable x of sort Fish, and the like. Then xF~,h and YP~t are unifiable with unifier {x +-- Ugoi, Y ~- ugoi} and ZM.... l and YPet are unifiable with unifiers {z *-- vDog,y *-- VDog) and {z +-- WCat,Y +-- went}. Many-sorted unification can be very effective as experiments with "Schubert's steamroller" puzzle indicate [85]. It blocks formation of terms that are nonsense from the standpoint of the sort structure of the problem. The number of clauses and literals in problems is reduced. Clauses 118 stating sorts of symbols and subsort relationships are eliminated. Because sort qualifier literals are removed from clauses so that, for example, the clause ~Fox(x) v -~Bird(y) v Eats(x, y) is replaced by the unit clause Eats(xFoz, YBi~d), the remaining clauses tend to be shorter, and there are likely to be more unit clauses. A further advantage is the abstract level of proofs using many-sorted unification. Suppose that foxes and birds are animals and that foxes like to eat birds. That some animal likes to eat some animal can be proved in a single resolution step by unifying the atoms of the as- sertion Eats(x~o~,ys~r~) and the negated theorem "~Eats(UA,~mal,VA,i~aZ). The instantiation of the variables of the theorem suggest the answer that all foxes like to eat all birds. With- out using many-sorted unification, the assertion -~Fox(x) V -~Bird(y) y Eats(x, y) could be re- solved with the negated theorem -~Animal(u) v -~Animal(v) V -~Eats(u,v). The resulting clause -~Fox(x) V -~Bird(y) V -~Animal(x) Y ~Animal(y) must then be refuted. This requires instan- tiation of x by some specific fox and y by some specific bird, e.g., the Skolem constants used in asserting the existence of foxes and birds, and the proof will end up mentioning a specific fox and bird. Worse yet, if there were a large number of assertions specifying that certain things were foxes or birds, there would be a large number of ways of instantiating the clause and thus a large number of proofs that may mention different foxes and birds. There are some important assumptions associated with the use of many-sorted unification. One is the assumption of nonemptiness of the sorts used. P(x) and -~P(x) are not contradictory if x's sort is empty. More restrictive in practice is the assumption that terms can be assigned their sorts a priori. For example, suppose Tweety is declared to be of sort Animal of which sort Bird is a subsort. The absence of the characteristic predicate Bird makes Bird(Twecty) inexpressible. Even if the Bird predicate is included, assuming or even proving the formula Bird(Tweety) has no effect on Twccty's declared sort, which is used to restrict the unification algorithm. Thus, there should be in the sort hierarchy only those sorts for which it is unnecessary to assume or prove that some

term is a subsort of its declared sort. A limitation of this form of many-sorted unification is the lack of polymorphic sort declara- tions. It is often very useful to declare predicates and functions to have more than one possible set of sorts of arguments and for the sort of a function's value to depend on the sorts of its arguments. More general procedures for reasoning about sorts are being developed [14,15,691. 119

4 Equality Reasoning

The equality relation is often used in problems to which theorem-proving programs are applied.

Because of its widespread use and the difficulties resulting from simply axiomatizing it, much effort has been devoted to developing special rules of inference for the equality relation.

4.1 Equality Axiomatization

The equality relation : is an equivalence relation, i.e., it is reflexive, symmetric, and transitive. These properties are usually given to theorem-proving programs as the following three assertions:

X:X

~(~ = y) v (~ = ~) ~(~ : y) v ~(y = ~) v (~ : ~)

However, this is not the only possible expression of these properties. A smaller set of assertions that conveys the same information is

X=X

~(~ : y) v ~(y = ~) v (~ = ~)

The symmetry property is obtained from these latter two assertions by resolving on x : x and -~(x : y) to yield -~(x -- z) V (z : x). The standard transitivity axiom can then be obtained by resolving this with the second assertion. The reduced number of assertions may yield a smaller search space with lower branching factor, though with sometimes longer proofs. In addition to reflexivity, symmetry, and transitivity, the equality relation possesses substi- tutivity properties, i.e., terms that are equal to each other can be substituted for each other anywhere in a term or formula. These are expressed by two sets of assertions that specify the predicate-substitutivity and functional-substitutivity axioms. For each n-ary predicate P other than :~ there are n predicate-substitutivity axioms of the form:

~(~, = ~) v ~p(xl,...,~,) v P(~I .... ,~,-1, ~, ~,+~,...,~°)

~(~. : ~) v ~P(~I, ...,~,) v P(~l,..., ~._1,~)

For each n-ary function f, there are n functional-substitutivity axioms: 120

~(xl = x) V (f(xl ..... x,~) = f(x, xz ..... x,~))

~(xl -~ x) V (f(xl,... ,xn) = f(xl,..., x~-1, x, xi+1,..., x,~)) ~(~ = ~) v (/(~1,...,~,) = :(xl .... , ~_l,x))

The problems of using this axiomatic formulation of the equality relation is the large number of axioms and their generality. In particular, the n predicate-substitutivity axioms for the predicate P are always resolvable with any literal with predicate P. The search space is large and contains many useless and redundant results. It is also very laborious to derive even simple consequences of equality. For example, to derive the obvious fact that f(g(h(a))) = f(g(h(b))) from a = b requires three applications of functional-substitutivityaxioms. These problems motivated the development of special rules of inference to be used in addition to resolution. These additional rules of inferences have been only partially successful. They have largely succeeded in reducing the length of proofs and deriving obvious results like the above in a natural way, but the rules are still sufficiently general that the problem of large search spaces for problems involving equality is not fully solved.

4.2 Demodulation

Demodulation [91] or rewriting or reduction [30] is the process of using a set of equalities to replace terms by equal terms. The equalities are oriented and made into reductions ~ ~ p that are used to replace instances of the term :k by the corresponding instance of the term p. Thus, for example, the term a + 0 can be reduced to a using the reduction x + 0 --~ x with substitution (x ~-- a}. The reduction process repeatedly applies reductions to a term until a term that cannot be further reduced is produced. For this process to terminate, the reductions must be oriented by some well-defined complexity measure so that, for every instance of a reduction, the right-hand side is less complex than the left-hand side. For example, associativity of + can be built into the reduction (x + y) + z ~ x + (y + z) because, by an appropriate complexity measure, terms parenthesized to the right are simpler, and the reduction process terminates, but commutativity cannot be used in a reduction, because the reduction x + y --+ y + x can be used to rewrite a A- b to b + a to a + b infinitely. Demodulation is used in resolution theorem proving to perform rapid equality inferences on derived terms. It also has the beneficial effect of reducing many equivalent terms to the same form (in the ideal case of a complete set of reductions, all equivalent terms to the same form) and 121 thus reducing the number of variants of equivalent terms appearing in clauses to be stored and facilitating subsumption. It is also useful for performing various programming llke tricks [87,88] such as maintaining lists of possible values of parameters in puzzles and removing individual possibilities by demodulation. Narrowing [73,42] is an extension of reduction that uses unification instead of pattern match- ing. A special case of paramodulation, narrowing is especially useful for constructing a unification algorithm in an equational theory specified by a complete set of reductions [21,32]. Let s and t be a pair of terms to be so unified and H be a symbol not occurring elsewhere. Then if H(s, t) can be transformed to H(s ~, t') by a sequence of narrowing operations and s t and t ~ are unifiable by the standard unification algorithm, then s and t are unifiable in the equational theory specified by the complete set of reductions with a unifier that is the composition of the unifier of s s and t t and the unifiers used in the narrowing steps.

4.3 Paramodulation

Paramodulation [89] is an equality inference rule that performs substitution directly. The paramod- ulant clause L(..- b.- -) v C V D can be derived by paramodulation from the clause (a = b) v C or (b = a) v C into the clause L(.-. a...) VD, where L(... a..-) denotes the literal L and a particular occurrence of the term a in L and where C and D are arbitrary clauses. That is, an equality atom can be used to replace one of its arguments by the other in any other literal, with the remaining literaIs of the two clauses included as part of the derived clause. In the general case, it may be necessary to find a unifying substitution for the term to be replaced and the equality-atom argument. Resolution plus paramodulation is complete provided the equality reflexivity axiom x = x is included [9]. Thus, the paramodulation rule eliminates the need for the equality symmetry, transitivity, and substitutivity axioms. This completeness result applies to unrestricted resolution plus paramodulation. If refinements, such as set of support are employed, it may be necessary to include functional-reflexivity axioms to preserve completeness. The set of functional-reflexivity axioms consists of, for each n-ary function f, the unit clause f(xa,..., xn) = f(xl,..., x,). These are instances of the reflexivity axiom x = x. An illustration of the necessity of the functional-reflexivity axioms when the set of support refinement is usecl is the refutation of the set of clauses P(x,x), a = b, and -~f(f(a),f(b)) with P(x, x) designated as the only clause in the set of support. To refute this set, it is neces- sary to paramodulate from the functional-reflexivity axiom f(x) = f(x) into P(x,x) to obtain 122

P(f(x), f(x)). Paramodulating from a = b into P(f(x), f(x)) yields P(f(a), f(b)), which can then be resolved with the input clause ~P(f(a), f(b)).

4.4 Resolution by Unification and Equality

Resolution by unification and equality (RUE) [17,18] adopts a different approach to incorporating equality reasoning into an inference rule. Where paramodulation applies equality substitution, producing a new literal from a literal and an equality literal, resolution by unification and equality derives a set of negative equality literals from a pair of literals. For example, while L(... b.- -) v C V D can be derived by paramodulation from (a = b) v C into L(... a...) V D, resolution by unification and equality performs the complementary operation of deriving the clause -,(a = b) v E V F from the clauses L(--- a...) V E and -~L(--- b-..) v F. The principle involved is that L(... a...) and -~L(... b...) can both be true only if a is not equal to b. Thus, -~(a = b), along with the other literals of the clauses containing L and -~L, can be derived. Of course, performing resolution by unification and equality may result in the formation of resolvents with more than one inequality literal if there is more than a single disagreement in the two literals being matched. For example, ~(a = c) v -,(b = d) can be derived from P(f(a, b)) and ~PCfCe, 4). There are completeness and efficiency issues involved in the selection of what disagreements are used to construct a resolvent by unification and equality. Matching P(f(a, b)) and -~P(f(c, d)) using the resolution by unification and equality rule must result in ~(f(a,b) = f(c,d)) for a successful refutation of the set of clauses consisting of P(f(a,b)), -~P(f(e, d)), and (f(a, b) = f(c, d)). Thus, creating an inequality literal from terms whose function symbols are the same but whose subterms disagree may be necessary for completeness. It is generally more efficient, and often successful in practice, to form inequality literals from the bottommost disagreement set, as in the earlier derivation of -~(a = c) v ~(b = d). The negative reflexive function (NRF) rule is also necessary. It creates from a clause -,(~ = t) V C the clause -~(s: = tl) V... v -~(s, = t~) V C, where (8:,t:) ..... (s,,t,) is a disagreement set between s and t. For example, if -~(f(a,b) = f(c,d)) is derived from P(f(a,b)) and -~P(f(c,d)) by resolution by unification and equality, the lower level disagreement -~(a = c) V -,(b = d) can be obtained by applying the negative reflexive function rule to -~(f(a, b) = f(c, d)). Appropriate use (i.e., suitable choice of disagreement sets) of the resolution by unification and equality and the negative reflexive function rules together yields a complete procedure for 123

equality reasoning.

4.5 E-Resolution

E-resolution [59] is similar to (but predates) resolution by unification and equality, but is a more complex rule of inference that includes the use of paramodulation. It is a higher level rule than resolution by unification and equality in much the same way that hyperresolution is a higher level rule than resolution. If the titerals L and L' can be made complementary by a sequence of paramodulation operations by clauses (s~ = Q) V C~,..., (s~ = t,) V C,, then D V E V C~ V... V C~ can be derived from L V D and L' V E by E-resolution, where D,E, C1,... ,C~ are arbitrary clauses. This is also both a generalization and specialization of of unification in equational theories. It specializes unification in equational theories by stipulating that paramodulation is used to match the expressions. It generalizes unification in equational theories because nonunit clauses containing equalities can be used and the theory is therefore not equational.

4.6 Knuth-Bendix Method

Let R be the set of reductions )~1 --4 /91, .. • , )~n ---4 Pn and E be the corresponding equational theory A1 = Pl, ... ,An = p~. Then R is a complete set of reductions for E if and only if it is terminating and confluent. The term tl can be reduced by R to the term tz (written tl --+ t~) if some subterm u of tl is an instance of ), (with substitution a) for some A --+ p in R and t2 is the result t2(u +- pa) of replacing u by the corresponding instance of p. R is terminating if and only

if there is no infinite sequence of reductions tl ~ t~ --+ .... R is confluent if and only if for every term t, if t 2+ tl and t 2+ t2 (i.e., t can be rewritten by _R to tl and t~ in zero or more steps) there is some term t' such that tl _I+ t ~ and t2 _!+ g. If R is a complete set of reductions for E, then tl ~= t~ J~ for every pair of terms tl and t2 such that tl =E t2, where t ~ denotes the result of reducing t by R to an irreducible form. Thus, a complete set of reductions for E can be used to solve the word problem for E.

The Knuth-Bendix method [36,29,30,10,41,42] provides a test for a set of reductions being locally confluent. A set of reductions R is locally confluent if and only if for every term t, if t --* tl and t --* t2 (i.e., t can be rewritten by R to each of tl and t2 in one step) there is some term t' • ! such that tx ~" t' and t2 --+ t. Terminating sets of reductions are confluent if and only if they are locally confluent. 124

Instead of considering all possible terms t that can be reduced by R to terms tl and t~, the

Knuth-Bendix method performs superposition operations that capture the general case of two reductions being simultaneously applicable to a term. Let )q --~ p~ and Aj ~ pj be two not necessarily distinct reductions in R with variables renamed so that they have no variables in common. Let u be a nonvariable subterm of ,kl that is unifiable with/~j with most general unifier a. Then the terms tl = p~a and t~ = ~i(u ~-- pj)a (the instantiation by a of ),~ with u replaced by pj) form a critical pair that represent one of the cases of of A~ --+ p~ and Aj --+ pj rewriting some term t (in this case, A~a) to terms tl and t~. If for every critical pair (ti, t2), tl J.-- t2 $, R is locally confluent. For example, the set of reductions (1) g(e,~) -~ (2) f(g(~), ~) ~ e (3) f(f(x, y), z) ---+f(x, f(y, z)) (4) f(~,~) ---, (5)/(~,g(~)) --, (6) ~(~) --, (7) g(#(x)) ~ (s) f(~(~), f(~, y)) ---,y (9) :(~, f(9(~),~/)) --, y (Io) gC:(~,y)) --,/(g(y),g(~)) is a terminating and locally confluent (and, hence, complete) set of reductions for free groups, where f is the group multiplication operator, g the group inverse operator, and e the group identity element. Two terms are equal in the theory of free groups if and only if they can be simplified to the same term by this set of reductions. But, the Knuth-Bendix method is more than a test for local confluence of sets of reductions. If the set of reductions is not locally confluent, it will generate a critical pair that leads to a counterexample, i.e., a pair of terms equal in the equational theory, but distinct and irreducible. If one of the terms is simpler than the other in a manner consistent with the complexity ordering of the other reductions, then the counterexample can be made into a reduction and added to the

current set of reductions being tested for local confluence. Thus, the Knuth-Bendix method can (1) terminate with no additional eounterexamples, re- sulting in a complete set of reductions, (2) terminate with a counterexample that cannot be oriented into a reduction because neither term is simpler than the other, or (3) continue gener- ating reductions forever (thereby constructing the infinite complete set of reductions). An example of Case (1) is that the previously mentioned complete set of reductions for free

groups is generable from reductions 1-3 by the Knuth-Bendix method. 125

An example of Case (2) is the generation of the unorientable equality f(x, y) = f(y, x) from reductions 1-3 plus f(x, x) -= e. An example of Case (3) is the generation of an infinite set of reductions f(x, f(y, f(x, y))) --+ f(x, y) f(x, f(y, f(x, f(y, w)))) --~ f(x, f(y, w)) :(=,:(v, f(z,f(=, f(v, z)))))-, f(=, f(v, ~)) :(=,f(v, f(~,:(=, :(v, f(~,~o)))))) --,/(=, f(y, :(~, ~))) from the reductions

f(=, =) -, f(f(x, y),z) ---+f(x, f(y, z))

The Knuth-Bendix method, when it applies, is extraordinarily powerful. The complete set of 10 reductions for free groups can be derived by computer from the original 3 with little wasted effort in just a few seconds. The chapter by Huet contains some further discussion of the standard

Knuth-Bendix method.

One of the most obvious limitations of the standard Knuth-Bendix method is its inability to handle theories with commutativity. Commutativity cannot be handled because the equa- tion f(x,y) = f(y, x) cannot be treated as a reduction without losing the required termination property. Despite examples such as the theory of free groups above that include associativity, the stan- dard Knuth-Bendix method is also somewhat deficient in its handling of associativity. For ex- ample, the set of reductions f(x,x) ~ x and f(f(x,y),z) ---+ f(x,f(y,z)) can be extended only to an infinite complete set of reductions, atthough the single reduction f(x, x) ~ x composes a complete set of reductions if f is assumed to be associative and associative pattern matching is used in its application. Such problems have provided motivation for extending the Knuth-Bendix method to handle equational theories that are divided into a set of reductions plus a set of additional equalities [63,28,35,43,44,45]. Functions that are associative and commutative are particularly important. With special handling for such functions, it is possible to derive complete sets of reductions for abelian groups and rings and many other interesting theories [31] An especially interesting example is that of Boolean algebra when the associative and commu- tative exclusive-or (0) and conjunction connectives are used as the basic set of logical connectives in terms of which formulas are rewritten [25]. The set of reductions 126

x =- y --~ x@y@true x D y---+ (xAy) @x(~true ~vy--~ (xAy) exe~, -~X --+ X O true x @ false --* x x e x -+ false x A true --~ x

x A f~e -* false

can be used to decide the equivalence of two formulas in the propositional calculus by associative- commutative identity checking of the results of reducing the two expressions to their irreducible forms using associative-commutative pattern matching. A formula is valid or unsatisfiable if and only if it reduces to true or false, respectively. Extensions of the technique can be used for theorem proving in the first-order predicate calculus. An approach to handling functions that axe associative and commutative in the Knuth-Bendix method is to employ associative-commutative identity checking, pattern matching, and unification in place of standard identity checking, pattern matching, and unification [63]. The immediate difficulty with carrying out this modification is that the reduction f(g(x), x) --* e is not directly applicable to the term f(g(a), b, a, e) where f is an associative and commutative function (again treated as an n-ary function for arbitrary n) because f(g(a), a) is not a subterm of f(g(a),b,a,e). The superposition process is likewise complicated. A solution is to enlarge the set of reductions. In particular, for every reduction ~ --* p where A is headed by f which is associative and commutative, the reduction f()%v) --+ f(p,v) is added, where v is a new variable not occurring in ~ ~ p. The embedding f(g(x),x,v) ~ f(e,v) (f(g(x),x,v) ~ v after rewriting the right-hand side) can be used to reduce f(g(a),b,a,c) to f(b,c) using the substitution {x +- a,v ~-- f(b,c)} obtained by associative-commutatlve pattern matching. The use of associatlve-commutative identity checking, pattern matching, and unification operations plus the addition of embeddings of reductions permit extension of the Knuth-Bendix method to handle functions that are associative and commutative.

References

[1] Andrews, P.B. Theorem proving via general rantings. Journal of the A CM 28, 2 (April 1981), 193-214. [2] Benanav, D., Kapur, D., and P. Narendran. Complexity of matching problems. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France, 127

May 1985. [3] Bl~ius, K., N. Eisinger, J. Siekmann, G. Smolka, A. Herold, and C. Walther. The Mark- graf Karl Refutation Procedure (Fail 1981). Proceedings of the Seventh International Joint Conference on Artlfieial Intelligence, Vancouver, B.C., Canada~ August 1981, 511-518. [4] Bibel, W. On matrices with connections. Journal of the ACMZS, 4 (October 1981), 633-645.

[5] Bibel, W. Automated Theorem Proving. Friedr. Vieweg & Sohn, Braunschwelg, West Ger- many, 1982. [6] Boyer, R.S: and J S. Moore. The sharing of structure in theorem-proving programs. In B. Meltzer and D. Michie (eds.). Machine Intelligence 7. Edinburgh University Press, Edin- burgh, Scotland, 1972, pp. 101-116. [7] Brachman, R.J., R.E. Fikes, and H.J. Levesque. Krypton: a functional approach to knowl- edge representation. IEEE Computer 16, 10 (October 1983), 67-73. [8] Brachman, R.J., V. Pigman Gilbert, and H.J. Levesque. An essential hybrid reasoning sys- tem: knowledge and symbol level accounts of Krypton. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 532-539. [9] Brand, D. Proving theorems with the modification method. SIAM Journal of Computing (December 1975), 412-430. [10] Buchberger, B. Basic features and development of the critical-pair/completion procedure. Proceedings of the First International Conference on Rewriting Techniques and Applications, Dijon, France, May 1985, 1-45.

[11] Chang, C.-L. The unit proof and the input proof in theorem proving. Journal of the ACM 17, 4 (October 1970), 698-707.

[12] Chang, C.-L. and R.C.-T. Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York~ New York, 1973.

[13] Clocksin~ W.F. and C.S. Meltish. Programming in Prolog. Springer-Verlag, Berlin, West Germany~ 1981.

[14] Cohn, A.G. Mechanizing a Particularly Expressive Many Sorted Logic. Ph.D. dissertation, University of Essex, Essex, England, January 1983.

[15] Cohn, A.G. On the solution of Schubert's steamroller in many sorted logic. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1169-1174.

[16] Dahl, V. Translating Spanish into logic through logic. American Journal of Computational Linguistics 7, 3 (September 1981), 149-164.

[17] Digricoli, V.J. Resolution by unification and equality. Proceedings of the Fourth Workshop on Automated Deduction, Austin, Texas, February 1979.

[18] Digricoli, V.J. The efficacy of RUE resolution: experimental results and heuristic theory. Pro- ceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 539-547. 128

[19] Eder, E. Properties of substitutions and unifications. Journal of Symbolic Computation 1 (1985). [20] Fages, F. Associative-commutative unification. Proceedings of the 7th International Con- ference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 194-208. [21] Fay, M.J. First-order unification in an equational theory. Proceedings of the 4th International Conference on Automated Deduction, Austin, Texas, February 1979, 161-167. [22] Gallier, J. Logic for Computer Science. Harper & Row, New York, New York, 1.986. [23] Henschen, L.J. and S.A. Naqvi. An improved filter for literal indexing in resolution sys- tems. Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, B.C., Canada, August 1981, 528-529. [24] Hewitt, C. Description and theoretical analysis (using schemata) of PLANNER: a language for proving theorems and manipulating models in a robot. Technical Report, Artificial Intel- ligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, April 1972. [25] Hsiang, J. Refutational theorem proving using term-rewriting systems. Artificial Intelligence Journal Z5, 3 (1985), 255-300. [26] Huet, G. Rdsolution d'6quations dana les langages d'ordre i,2,.. ,w. Th~se d'6tat, Sp~cialit~ Mathmatiques, Universit~ Paris VII, 1976. [27] Huet, G. An algorithm to generate the basis of solutions to homogeneous diophantine equa- tions. Information Processing Letters 7, 3 (April 1978), 144-147. [28] Huet, G. Confluent reductions: abstract properties and applications to term rewriting sys- tems. Journal of the ACM 27, 4 (October 1980), 797-821. [29] Huet, G. A complete proof of correctness of the Knuth-Bendix completion algorithm. Journal of Computer and System Sicences P3 (1981), 11-21. [30] Huet, G. and D.C. Oppen. Equations and rewrite rules: a survey. Technical Report CSL-111, Computer Science Laboratory, SRI International, Menlo Park, California, January 1980. [31] Hullot, J.-M. A catalogue of canonical term rewriting systems. Technical Report CSL-113, Computer Science Laboratory, SRI International, Menlo Park, California, April 1980. [32] Hullot, J.-M. Canonical forms and unification. Proceedings of the 5th International Con- ference on Automated Deduction, Les Arcs, France, July 1980. Lecture Notes in Computer Science 87, Springer-Verlag, Berlin, West Germany, pp. 318-334. [33] Jaffar, J., J.-L. Lassez, and J. Lloyd. Completeness of the negation as failure rule. Proceed- ings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Germany, August 1983, 500-506. [34] Jaffax, J. Minimal and complete word unification. Technical Report 51, Department of Com- puter Science, Monash University, Clayton, Victoria, Australia, March 1985. [35] Jouannaud, J.-P. and H. Kirchner. Completion of a set of rules modulo a set of equations. Technical Note, Computer Science Laboratory, SRI International, Menlo Park, California, April 1984. t29

[36] Knuth, D.E. and P.B. Bendix. SimpIe word problems in universal algebras. In Leech, J. (ed.), Computational Problems in Abstract AIgcbras, Pergamon Press, 1970, pp. 263-297.

[37] Kowalski, R. A proof procedure using connection graphs. Journal of the A CM 22, 4 (October 1975), 572-595. [38] Kowalski, R.A. Algorithm = logic + control. Communications of the A CM 22, 7 (July 1979), 424-436. [39] Kowalski, R. Logic for Problem Solving. Elsevier North-Holland, New York, New York, 1979.

[40] Kowalski, R. and D. Kuehner. Linear resolution with selection function. Artificial Intelligence 2 (1971), 227-260. [41] Lankford, D.S. Canonical algebraic simplification in computational logic. Technical Report, Department of Mathematics, UniverMty of Texas, Austin, Texas, May 1975. [42] Lankford, D.S. Canonical inference. Report ATP-32, Department of Mathematics and Com- puter Sciences, University of Texas at Austin, Austin, Texas, December 1975. [43] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative axioms: Complete sets of commutative reductions. Report ATP-35, Depart- ment of Mathematics, University of Texas, Austin, Texas, March 1977. [44] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with permutative axioms: Complete sets of permutative reductions. Report ATP-37, Department of Mathematics, University of Texas, Austin, Texas, April 1977. [45] Lankford, D.S. and A.M. Ballantyne. Decision procedures for simple equational theories with commutative-associative axioms: Complete sets of commutative-assoclative reductions. Report ATP-39, Department of Mathematics, University of Texas, Austin, Texas, August 1977. [46] Lankford, D., G. Butler, and B. Brady. Abelian group theory unification algorithms for elementary terms. Technical Report, Mathematics Department, Louisiana Tech University, Ruston, Louisiana, 1983. [47] Livesey, M. and J. Siekmann. Termination and decidability results for string unification. Memo CSM-12, Essex University, Essex, England, 1975. [48] Livesey, M. and J. Siekmann. Unification of A+C-terms (bags) and A÷C+I-terms (sets). Interner Bericht Nr. 5/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976.

[49] Lloyd, J.W. Foundation~ of Logic Programming. Springer-Verlag, New York, New York, 1984.

[50] Loveland, D.W. A linear format for resolution. Proceedings of the IRIA Symposium on Auto- matic Demonstration, Versailles, France, 1968. Lecture Notes in Mathematics 125, Springer- Verlag~ Berlin, West Germany, 1970, pp. 147-162. [51] Loveland, D.W. A simplified format for the model elimination procedure. Journal of the ACM 16, 3 (July 1969), 349-363.

[52] Loveland, D.W. Automated Theorem Proving: A Logical Basis. North-Holland, Amsterdam, the Netherlands, 1978. 130

[53] Luckham, D. Refinement theorems in resolution theory. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles, France, 1968. Lecture Notes in Mathematics 125, Springer-Verlag, Berlin, 1970, pp. 163-190. [54] Makanin, G.S. The problem of solvability of equations in a free semigroup. Soviet Akad. Nauk SSSR 233, 2 (1977).

[55] Manna, Z. and R. Waldinger. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems 2, 1 (January 1980), 90-121.

[56] Manna, Z. and R. Waldinger. The Logical Basis for Computer Programming. Addison-Wesley, Reading, Massachusetts, 1985. [57] Martelli, A. and U. Montanari. An efficient unification algorithm. ACM Transactions on Programming Languages 4, 2 (April 1982), 258-282. [58] McCharen, J., R. Overbeek, and L. Wos. Complexity and related enhancements for auto- mated theorem-proving programs. Computers and Mathematics with Applications 2 (1976), 1-16. [59] Morris, J.B. E-resolution: extension of resolution to include the equality relation. Proceedings of the International Joint Conference on Artificial Intelligence, Washington, D.C., May 1969, 287-294. [60] Murray, N.V. Completely non-clausal theorem proving. Artificial Intelligence 18, 1 (January 1982), 67-85. [61] Overbeek, R. An implementation of hyperresoIution. Computers and Mathematics with Ap- plications 1 (1975), 201-214. [62] Paterson, M.S. and M.N. Wegman. Linear unification. Journal of Computer and Systems Science 16, 2 (April 1978), 158-167. [63] Peterson, G.E. and M.E. Stickel. Complete sets of reductions for some equational theories. Journal of the Association for Computing Machinery 28, 2 (April 1981), 233-264.

[64] Plaisted, D.A. The occur-check problem in Prolog. New Generation Computing 2, 4 (1984), 309-322. [65] Plotkin, G.D. Building-in equational theories. In Meltzer, B. and D. Michie (eds.). Edinburgh University Press, Edinburgh, Scotland, 1972, pp. 73-90. [66] Robinson, J.A. A machine-oriented logic based on the resolution principle. Journal of the ACMIP, 1 (January 1965), 23-41. [67] Robinson, J.A. Logic: Form and Function. Elsevier North-Holland, New York, New York, 1979. [68] Rulifson, J.F., J.A. Derksen, and R.J. Waldinger. QA4: a procedural calculus for intuitive reasoning. Technical Note 73, Artificial Intelligence Center, SRI International, Menlo Park, California, November 1972. [69] Schmidt-Schauss, M. A many-sorted calculus with polymorphic functions based on resolution and paramodulation, Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1162-1168. 131

[70] Siekmann, J.H. Unification of commutative terms. Interner Bericht Nr. 2/76, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, 1976.

[71] Siekmann, J.H. Universal unification. Proceedings of the 7th International Conference on Automated Deduction, Napa, California, May 1984. Lecture Notes in Computer Science 170, Springer-Verlag, Berlin, West Germany, pp. 1--42. [72] Siekmann, J. and W. Stephan. Completeness and soundness of the connection graph proof procedure. Interner Bericht 7/76, Institut ffir Informatik I, Universit~t Karlsruhe, Karlsruhe, West Germany, 1976. [73] Slagle, J.R. Automated theorem-proving for theories with simplifiers, commutativity, and associativity. Journal of the ACM 21, 4 (October 1974), 622-642. [74] Smolka, G. Completeness and confluence properties of Kowalski's clause graph calculus. Interner Bericht 31/82, Institut ffir Informatik I, Universit£t Karlsruhe, Karlsruhe, West Germany, December 1982.

[75] Stickel, M.E. A complete unification algorithm for associative-commutative functions. Pro- ceedings of the Fifth International Joint Conference on Artificial Intelligence, Tbilisi, Geor- gia, U.S.S.R., September 1975, 71-76.

[76] Stickel, M.E. Mechanical Theorem Proving and Artificial Intelligence Languages. Ph.D. dis- sertation, Computer Science Department, Carnegie-Metlon University, Pittsburgh, PennsyI- vania, December 1977.

[77} Stickel, M.E. A unification algorithm for associative-commutative functions. Journal of the ACM 28, 3 (July 1981), 423-434.

[78] Stickel, M.E. A nonclausal connection-graph resolution theorem-proving program. Proceed- ings of the AAAI-S2 National Conference on Artificial Intelligence, Pittsburgh, Pennsylva- nia, August 1982, 229-233.

[79] Stickel, M.E. Theory resolution: building in nonequational theories. Proceedings o/the AAAI- 83 National Conference on Artificial Intelligence, Washington, D,C., August 1983, 391-397. [80] Stickel, M.E. A Prolog technology theorem prover. New Generation Computing P, 4 (1984), 371-383.

[81] Stickel, M.E. Automated deduction by theory resolution. Journal of Automated Reasoning t, 4 (1985), 333-355. [82] Stickel, M.E. and W.M. Tyson. An analysis of consecutively bounded depth-first search with applications in automated deduction. Proceedings of the Ninth International Joint Conference on Artificial Intelligence, Los Angeles, California, August 1985, 1073-1075. [831 van VaMen, J. An extension of unification to substitutions with an application to auto- matic theorem proving. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, TbiIisi, Georgia, U.S.S.R., September 1975, 77-82. [84] Walther, C. A many-sorted calculus based on resolution and paramodulation. Proceedings of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, West Ger- many, August 1983, 882-891. 132

[85] Walther, C. A mechanical solution of Schubert's steamroller by many-sorted resolntion. Proceedings of the AAAI-8~ National Conference on Artificial Intelligence, Austin, Texas, August 1984, 330-334. Revised version appeared in Artificial Intelligence g6, 2 (May 1985), 217-224. [86] Walther, C. Unification in many-sorted theories. Proceedings of the 6th European Conference on Artificial Intelligence, Pisa, Italy, September 1984. [87] Winker, S.K. and L. Wos. Procedure implementation through demodulation and related tricks. Proceedings of the 6th International Conference on Automated Deduction, New York, New York, June 1982. Lecture Notes in Computer Science 138, Springer-Verlag, Berlin~ West Germany, pp. 109-131. [88] Wos~ L, R. Overbeek, E. Lusk, and J. Boyle. Automated Reasoning. Prentlce-Hall, Engle- wood Cliffs, New Jersey, 1984. [89] Wos, L. and G.A. Robinson ParamoduIation and set of support. Proceedings of the IRIA Symposium on Automatic Demonstration, Verailles~ France~ 1968. Lecture Notes in Mathe- matics 125, Springer-Verlag, Berlin, 1970, pp. 276-310. [90] Wos, L., G.A. Robinson, and D.F. Carson. Efficiency and completeness of the set of support strategy in theorem proving. Journal of the ACM 12, 4 (October 1965), 536-541. [91] Wos, L., G.A. Robinson, D.F. Carson, and L. Shalla. The concept of demodulation in theorem proving. Journal of the ACM I~, 4 (October 1967), 698-709. Fundamental Mechanisms in Machine Learning and Inductive Inference

Alan W. Biermann Durham, NC 27706

Supported in part, by the U.S. Army Research Office under $rant DAAG-29-84-K-0072 134

I. INTRODUCTION

While learning and inductive inference are two distinctively different phenomena, they often appear together, and therefore, it is appropriate to study them simultaneously. Learning, for the purposes of this article, wilt be said to occur when a system self modifies to improve its own behavior. The scenario is thus that the system operates at a given performance level at one time, experiences events of one kind or another, and self modifies with purpose to achieve a higher level of performance at a later time.

Inductive inference occurs when a system observes examples (and possibly nonexamples) of a set and constructs a general rule to characterize the set. Thus, as an illustration, such a system might be shown several examples of arches and several objects that are not arches and asked to inductively infer a general rule that will distinguish all arches from other objects. The induced rule is only a guess based upon incomplete information, the known examples and nonexamptes. However, if the input information is representative, the guessed rule will be correct or nearly correct. If the rule has shortcomings, additional examples will often result in convergence to a correct form. The phenomenon of inductive inference has been studied under many different names in the literature including generalization, induction, concept for- mation, learning, categorization, and theory formation.

Most systems that learn use inductive inference as the mechanism for improving behavior. That is, in the process of performing a task, the system infers rules about the domain and uses those rules in later actions to achieve a higher level of performance. This is the kind of learning system that will be studied here. Examples of learning systems that do not use inductive inference are those that improve behavior by simply memorizing facts or by discovering new behaviors using introspective mechanisms.

The learning mechanisms to be studied here fall into five different categories, systems which learn.

(1) finite functions, (2) grammars, (3) programs from traces, (4) LISP programs form input-output pairs, and (5) PROLOG programs from oracle queries. The first type of system learns functions which receive inputs and in a single computational action compute the associated output. The second type can learn a grammar for a language from example strings (or sentences) in the language (and possibly some nonsen- tences). The third type of system requires that the user lead the machine through a trace of a sample computation and then it infers a program for doing the computation. The fourth and fifth approaches 135 involve discovering LISP and PROLOG programs that can achieve certain target input-output behaviors.

As each of these studies is undertaken, it is important to keep in mind the various measures of a learning machine. One should first notice the nature of the required training information. Are only posi- tive examples of target behavior given, or are both positive and negative examples given? Is the informa- tion provided at random from the external world or can the learning machine ask for any fact it needs? Is the target behavior presented strictly in terms of input-output requirements, or does the training informa- tion show how the output is to be obtained from the input? Also one should notice whether it is possible to specify for the given learning machine exactly what is the set of behaviors that it can learn? Finally, what are the levels of error and rates of learning for the machine?

H. LEARNING FINITE FUNCTIONS

A finite function will be defined for the purpose of this study to be any function which sequentially inputs a bounded amount of information and then computes an answer. Later sections in this chapter will study the acquisition of functions or programs which may process an input of unbounded length. While there are many finite function learning machines in the literature, five will be discussed here. Methods will be described for learning

(1) linear evaluation functions, (2) signature tables, (3) Boolean conjunctive and disjunctive normal forms, (4) Michalski expressions, and (5) semantic nets.

Learning Linear Evaluation Functions

A linear evaluation function has the form y ~ ~ c i x i where y is the computed value, xl,x~,...,x n are i=l n inputs, and cl,c2,...,c n are variable coefficients. Learning is done by adjusting the coefficients for improved behavior. In many systems, such a linear function is built into a larger system which utilizes the computed value y for evaluating alternative decisions. Thus in a pattern recognition problem (Nilsson

[65], Minsky and Papert [69]), the xi's may represent measurements or feature values of an unknown pat- tern and the pattern will be recognized as belonging to a given class if y is positive. In a game playi.,~g 136 situation (Samuel[59]), the xi's represent feature values of the specific position on the board and y is

assumed to give a measure of the desirability of that position.

Linear evaluation systems have been important in the learning literature because there are learning algorithms with guaranteed convergence to a solution if one exists and because much is known about the class of behaviors that these systems can compute.

An example of a learning algorithm for such systems is the following (taken from Minsky and Papert

[69]).

START: Choose the constants c i randomly.

TEST: Select an object from the set to be learned (positive information) or from outside the set (nega-

tive information), obtain its feature values Xl,X2,...,xa, and compute y -~ ~ c i x i. If positive i=l

information was selected:

If y > 0 then go to TEST.

Ify < 0 then go toADD.

If negative information was selected:

If y < 0 go to TEST.

If y > 0 go to Sb~.

ADD: For each i, ci = ci q- xi.

Go to TEST.

SUB: For each i, cl = el - xl.

Go to TEST.

This algorithm loops without termination continuously selecting objects and testing its classification rule

that asserts the object is in the class if y is positive and out otherwise. If a particular selected object is

correctly classified, the algorithm does nothing except choase another object to test. If the object is not

correctly classified, the coefficients are altered in the direction to increase y for positive information and 137 decrease it for negative information.

If linear evaluation methods are used in a pattern recognition environment, then the learnable classes are those which are linearly separable in their feature spaces as defined, for example, by Nilsson

[65]. Such classes are reasonably well understood and applicable in many domains (Fu [75]). However, many important features in a pattern cannot be recognized by these systems as has been described by

Minsky and Papert [69]. For example, they showed that the well known "perceptron" recognizer which employs linear decision making is not capable of distinguishing geometric properties such as "connected- ness" and "parity".

Learning Signature Tables

Because of the limitations of linear methods, Samuel [67] developed a decision making scheme based on sequential table lookup as shown in Figure 1. The input values xl,...,xn are used to obtain output values from the lowest level table; these output values become inputs to the next level and so forth until a final function value is returned at the top level. Signature tables are capable of computing nonlinear func- tions and they are very fast in execution. The class of learnable functions has been characterized by Bier- mann, Fairfield, and Beres [82] and an optimal though expensive learning algorithm is known.

The key insight needed for understanding signature tables comes from constructing a matrix of all the function values for each table in the system. The matrix for a table should have a row for each set of input values that feed that table and it should have a column for each set of input values that do not feed that table. As an illustration, consider the table labeled A in Figure 1. Its associated matrix, shown in Fig- ure 2, has a row for each assignment of values to (xI,x2). These are inputs that "feed" table A. It has a column for each possible vector (x3,x4,x~,x6). We note that this matrix has only two distinct rows; the first and last rows are identical as are the second and third rows. This means that table A needs only two out- put values and that the first and last entries must be identical and the second and third entries must be identical, Thus this matrix shows that entries (0,i,1,0) must be made into the output column of table A.

(Actually, (1,0,0,1) would also be satisfactory.) Similarly, aI1 other output columns of all other tables can be derived from their associated matrices so one has a synthesis methodology for such systems. @

0

@ r~ 0

0

II @

~-" 0 ~-.' 0

@

0~ X

~0

X

0 0 0 0 c~ o

0 0

0

0 C3 ~ 0 C~ 0 C~

0

0 140

The synthesis methodology begins with a signature table system like the one shown in Figure 1 but with the output values for the tables being unknown. A matrix is constructed for each table in the sys- tem using the function to be realized as described above and the associated table output as derived. The resulting signature table system will correctly compute the target function.

Samuel, however, was not able to use this learning scheme in the checker playing application because the size of the matrices would be too large and not all entries were known. His method amounted roughly to counting the number D of O's in a row and the number A of l's in that row and computing a coefficient C ~--- (A - D) / (A + D). Then rows which had similar C's were given the same output values in the signature tables. Thus, as explained in Biermann et al. [82], Samuel identified rows with similar weights whereas the ideal solution identifies rows with similar or identical profiles. His system thus made errors proportional to the degree of variation of his method from the ideal. An analysis of his methodology and a suggestion for its improvement appears in Biermann et al. [82].

Signature tables have been used successfully in many applications in addition to game playing (see also Truscott [79] and Smith [73]) such as medical decision making (Page [77]) and operating systems

(Mamrak and Amer [781).

Learning Conjunctive and Disjunctive Normal Form Boolean Expressions

Valiant [84] has developed a series of algorithms for constructing normal form Boolean expressions from examples of target behavior. One class that was solved is the set of k-conjunctive normal form expressions which are made up of products of unions of not more than k input variables (some of which may be negated). Thus

y ~--- xl (x2 + x4) (x2 + xs) is a 2-conjunctive normal form since no more than two variables appear in any single conjunct. The out- put y can be computed from the inputs using the usual Boolean conventions so that, for example,

(Xl,Xs,Xs,X4) =(1,1,1,1) yields y= 1 and (xl,x2,xs,x4)=(1,0,1,1) yields y=0.

Valiant has given a strategy for learning such expressions from positive examples only (where y = 1). One begins with the k-conjunctive normal form which includes all possible k-conjuncts and then as each positive example behavior is encountered, those conjuncts which do not cover that example are 141 deleted. This process will be illustrated in she learning of a 2-conjunctive normal form when there are three possible inputs xl,x2, and x 3. The initial expression contains all possible 2-conjuncts. The over-bar notation is used to indicate negation.

y = xl x~ x8 x~ x2 x3 (xl + x~) (xt + ~) (~ + x~) ~ + ~2) (x~ + x~) ...... (~2 + ~s)

Suppose a function is to be learned and the following positive example has been received: y = 1 when

(xl,x2,xs) = (1,1,0). Then all conjuncts which yield 0 on this input are removed from the initial expres- sion for y. That is, x3,xl,x2,(xl + ~2), etc. are removed leaving the following expression.

Y = Xl X2 X3 (Xl + X2) (Xl -~- X'2) ('X1 -]- X2) (Xl -[- X3) ...... (X2 -]- X3)

If a second positive example is presented, say y : 1 if (x~,xe,x3) : (0,0,0), then the expression would be simplified further.

y = ~ (x~ + ~) (~ + x~) ...... (~ + ~).

Clearly a sequence of such positive examples will quickly lead to a final expression if one exists capable of computing the target function.

Valiant [84] uses a probabilistic model for selection of examples and defines a function to be learned when the probability of error on positive examples is less than 1/h where h is an arbitrary value. He has shown that, using his model, the k-conjunctive normal form expressions are learnable with a polynomial number of positive examples and in polynomial time on the parameters h and k. He has also developed similar results on the monotone disjunctive normal form expressions (where negation is not allowed) and on other classes of Boolean expressions.

Learning Miehalski Expressions

Michalski has developed a methodology for inducing generalizations from instances of scientific data and thus producing theories from observations. This methodology has been widely applied to medical and agricultural problems with considerable success (Miehatski [80]).

The methodology begins by coding specific observational data into symbolic form and then perform- ing generalizations on the basic data until one or more theories can be induced. For example, in a particu- lar application a biological cell of a known type was described as follows: 142

CELL1, B1, B2 .... , B6

[contains (CELL1, BI, B2, ..., B6)]

fcirc (CELLI) = 8]

[pplasm (CELL1) = A]

[shape(B1) = ellipse] [texture (m) = strips]

[shape (B2) = circle]

etc.

This statement asserts that there is a cell containing objects B1, B2,..., B6 and it enumerates various pro- perties of the cell and these six objects. Many other cells of the known type were similarly coded and

many cells outside of this class were coded. The task of the system in such problems is to find the proper- ties or combination of properties needed to distinguish members of the known type from other members.

It is often easy to find a way to distinguish one class from another. One way is to store all the source data and compare each unknown object with the set of known objects. The primary objection to this strategy is that it does not lead to understanding. It is much more desirable to know in simple terms what differentiates one class from another and then use these defining properties. The goal of the Michal- ski system is to discover such simple defining properties.

Michalski has thus introduced the concept of preference criteria which enable the user of his system to limit the complexity of the generated theory. The user can set a series of weights which cause the sys- tem to bias its generated theories as desired along various complexity measures. The user can thus request that. the number of operators, the cost of measuring the features, and other significant complexity factors be minimized.

The knowledge base for the system comes from both the given data samples and from rules related to the domain. For example, in a domain dealing with shapes, the system might be told that n-sided figures for any n are called polygons. Such rules are important because they give the system the opportun- ity to simplify theories and to achieve the preference criteria given by the user. 143

There are many generalization rules used by the system to build theories. Two will be described here to give the flavor of the approach. One is called the dropping condition rule. It states that if both A and

B are observed in members of a type, then perhaps A alone is enough to characterize the type. That is, suppose it is known that all observed basketball players had the characteristics of being both tall and handsome; perhaps only tallness is needed to differentiate basketball players from others. A second exam- ple generalization rule is the adding alternative rule which states that A is observed in all eases of a type, possibly the type is characterized by the condition (A or B). Thus, again suppose all observed basketball players have been tall, one could propose the hypothesis that all basketball players are either tall or strong.

Generalization rules have the properties of increasing the number of cases covered, and when used in combination, decreasing the told complexity of the describing expressions. The task of the Michalski sys- tem is to find combinations of such operators which will reduce the original data (as described in the second paragraph of this section) to simple expressions which successfully separate the specified type from all other cases. The program does this with a complex combination of extensive searching and "hill climb- ing" on the preference criteria.

For example, in the cell classification problem given above, the system found five different ways of separating the given type from the other cells. Each defining rule is clearly a tremendous simplification of the original given characteristics of the cells and provides rather helpful observations about the type being considered. The five theories are as follows:

1. ~ (1) S [texture (B) = shaded] [weight (B) _> 3]

2. [circ = even]

3. 3(> 1) B [shape (B) = boat] [orient (B) = N v NE]

4. 3(_ 1) B [#tails - boat (B) = 1]

5. ~ (1) S [shape (S) = circle] [#contains (S) = 1]

The first rule state that the cells of the given type differed from other cells in that they all contained an object of shaded texture and weight greater than or equal to 3. The second rule states that the observed cells of that type all had even circumference and this differentiated them from the other cells. The other 144 three theories give equally interesting and concise information.

The Michalski system thus provides a method for scientific investigators to reduce symbolic data and to search for generalizations which may help to understand it. The investigator first must find a way to encode the problem data into a descriptive form satisfactory for input to the program. Then the back- ground knowledge and observational statements must be coded. Finally, it is necessary to specify the type of description desired and preference criteria to guide the program toward acceptable solutions. The induced generalization may help the scientist to understand his data better and may suggest further ave- nues for research.

Learning Semantic Network Descriptions

The learned data structure might also be a semantic network instead of symbolic calculus expres- sions of the type described above. Although many variations on the idea exist (Findler [79]), the usual semantic network represents objects as nodes on a graph and relationships between those objects with directed arcs between the nodes. Minsky [68] and his students extensively explored the concept of the semantic network during the 1960's and Winston showed how such structures can be synthesized from examples.

An illustration of this type of learning is shown in Figure 3 where a representation of an arch is con- structed at the top on the basis of one example. An arch is assumed to exist whenever two bricks support a third brick. However, a second example is given to the system showing that the supported object does not need to be a brick; it may be a wedge or perhaps other object. The third example presents negative information that can be used to derive a necessary condition for an arch: the supporting bricks may not touch.

The Winston system works by merging the semantic nets from all positive examples and applying information from negative examples to determine minimum conditions for the concept. This work is important because it is one of the few learning mechanisms ever aimed at the construction of semantic nets. One of the significant results was Winston's discovery of the import~ance of having negative informa- tion in available examples to prevent overgeneralization in the learning process. 145

Example 1 ARCH

p dron Example 2

ARCH

polyhedron Example 3 is-a

NOT AN ARCH

Figure 3. Building a semantic net to represent the concept

of an arch. 146

HI. LEARNING GRAMMARS

Introduction

In contrast to the above learning systems, a learning machine may be required to classify data of unbounded length. Thus a system may receive strings of symbols of arbitrary length and have the task of classifying those strings as being in or not in a specified type. Since grammars are commonly used for clas- sifying strings, it is reasonable to study the problem of inferring or constructing grammars from examples.

As an illustration, suppose the set of strings

A

BAA

ABABA

AAA

BBA

ABAA is known to be selected from a specific class. The question arises as to what general rule may characterize the strings in the class. One could make many hypotheses on the basis of these few examples, but a rea- sonable guess might be that every string ends in an A. A grammatical inference system would specify its guessed rule for the class by giving a grammar. The grammar for the set of strings ending in A is as fol- lows where v is a nonterminal symbol and upper case letters are terminals.

v --~ iv

v-*Bv

v ---* A

If one has a successful grammatical inference system, it can find a grammar that represents the set to be classified and ever after use that grammar to correctly classify strings even if they have not previously been observed. Thus if a system has correctly discovered the above grammar, it could accurately classify such strings as ABAAAABA and ABAAAB as, respectively, in and not in the target set. The learned grammar can be thought of as either a recognizer of strings or a theory of the given data. 147

The grammatical inference model to be studied here assumes the existence of an information source and an inference machine. The information source selects a language L from a known class C of languages and presents examples which may be in L or not in L to the inference machine. At each time t=1,2,3,.., the information source presents a string which is marked "+" if the string is in L and "-" oth- erwise. -Information sources may be of two kinds, positive information sources which are organized so that every string in L appears at least once in the sequence and complete information sources which produce

every possible positive or negative example at least once in the sequence. At each time t=1,2,3,.., the inference machine uses all information gathered to that time to make a guess at a grammar for the language L. The inference machine knows which class C the unknown language belongs to and must select a grammar for a member of this class.

Using this model, there are many possible definitions of learnahility and three will be examined here, finite identification, identification in the limit, and strong approachability. It turns out that the complex- ity of the learnable grammar varies greatly depending on the definition of learnability used and on whether or not a complete information source with positive and negative examples is available.

The next section will show what cla~ss of grammar can be learned under the various definitions of learnability and the last section will give an example of a grammar inference algorithm.

Finite Identification

The first definition of learnability to be examined here is finite identification. With this definition, it is required that after only a finite number of samples from the information source, the inference machine identifies the unknown language correctly and announces that it has done so.

Suppose the class C is made up of the three languages L1,L2, and L 3 which are enumerated as fol- lows:

Li = {A}

L2 = {AB, AAB}

L3 = {AB, AAB, AAAB} 148

This is one of the easiest learning problems imaginable since the inference machine needs only to see a few examples of the given languages to distinguish which is being presented. Thus if the information source presents the example A+, the machine will know that L 1 is correct and print the grammar {v --* A}. If the information source presents the example AB+, than either L2 or L3 will be correct but additional information is needed to make the selection. If AAA + comes from the information source, it will be pos- sible to select L z.

However, suppose the information source presents positive information only and presents the follow- ing sequence:

AB+, AB+,AB+, AAB+, AB+, AAB+, .....

One might suspect that L~ is being presented but one cannot be sure. It may be that AAAB will appear as the billionth string in the sequence and that Ls is the correct answer. There is no way to prove that L 2 is the answer because one can never be sure that AAAB will not appear later. The conclusion is that even this simple class is not learnable using positive information only. There exists a member Lz of the class which cannot be distinguished from other members from positive information only. In this case, the infer- ence machine cannot at any time select L~ and announce that it has correctly identified the unknown.

From the example, one can conclude the following: A finite class of finite sets is not in general finitely identifiable from a positive information source.

On the other hand, if this class C is to be approached using complete information, both positive and negative, then any member ean be finitely identified. Consider the following sequence from such an infor- mation source for L z.

A-, AB+, B-, AA-, AAB+,..., AAAB-,...

Since a complete information source will include every possible string somewhere, the key string AAAB- will occur and when it does, the inference machine will be able to announce (with a proof) that L2 is the correct choice and print the grammar {v --+ AB, v --* AAB}. It is not predictable when the key string will occur, but it is known that it will appear somewhere. A generalization of this argument leads to the result that a finite class of finite sets is finitely identifiable from a complete information source. t49

If C is the class of all finite sets, the problem of learning is much more difScult. Even with a com- plete information source, it is not possible to discover which finite set is to be selected at any given point in time because later samples may always produce unpredicted behaviors. Thus one can conclude that the

class of finite sets is not finitely identifiable from a complete (or positive} information ~ource.

These results are summarized in the chart given below. An X appears in the entry where learnability was achieved. It is somewhat surprising that despite the simplicity of the problems being examined, only one positive result was obtained. Evidently the definition of learnability is so strict that only the most trivial learning problems can be solved. Another notable observation is that a complete information source is substantially more powerful than a positive only source. This effect will be seen more dramatically in

later sections.

The Finite Sets Not learnable Not learnable

Finite Class X Not learnable of Finite Sets

Complete Information Positive Information

Figure 4. Learnability summary for finite identication.

Identification in the Limit

There are numerous examples of learning in nature such as the learning of natural language by chil- dren. Yet our discussion above showed that only the most trivial things can be learned if finite identification is required. In this section, the requirement will be removed that the system announces its final answer after a finite amount of time. Learning by identification in the limit will be achieved if the system correctly guesses the right answer at each time after some To but To is not known. In other words the learning system may guess the same answer for millions of consecutive times without being sure it has the correct answer. If unexpected data appears at any time, the system can modify its guess and hold that theory for an arbitrary length of time. The fact that the system is never required to announce a final 150 answer greatly increases the number of things that, can be learned. The system is required to make a guess at some point that wilt never be changed again as new information arrives but it will never be sure it has achieved the final answer.

Many types of language can be identified in the limit. Consider, for example, the class C of all finite sets and assume that a positive information source is available. Assume the inference machine uses the strategy of guessing that the unknown language is made up of exactly the strings seen so far. Thus if the strings A+ and 13+ have been seen, the guessed grammar would be {v ~ A, v --* B}. It is easy to see that this system will identify each language L in C in the limit because every string in the unknown L must appear at some time. So L will be guessed after all have been observed and the system will never change its guess. However, the system will not know that it has seen every member so will not be able to announce that it has a final answer. We conclude that the finite sets can be identified in the limit from

positive information only, a result that is much stronger than was possible in the previous section.

However, if the above problem is made slightly more difficult, it is no longer possible to identify in

the limit. Let C be the class of all finite languages plus the infinite language L0 that contains all possible strings. There exists an information sequence for L0 which has the property that the inference system will

never select L0 and remain with it permanently. The inference system will change its guess repeatedly and without end. The pathological information sequence is designed as follows: A finite set Ll of strings is

presented repeatedly until the system selects L1; then additional strings are presented until it selects Lo;

then those finite strings are repeated until it selects that set as its finite guess, call it L2; then additional

strings are presented until it selects L0; and so forth. Such an information sequence forces the inference

system to change its guess an infinite number of times and thus violate the definition of identifiabitity in

the limit. A class of languages containing the finite sets and one infinite language is called super finite.

The current conclusion is that the super finite languages are not identifiable in the limit from positive

information only.

If complete information is available, many classes of languages are identifiable in the limit. Consider

any class of decidable rewriting systems. Such classes C must have these properties: 151

(1) O must be enumerable. That is there must be an effective way to list the grammars

G1,G2,Gs,... for all the languages in C.

(2) O must be decidable. That is, if G i is a grammar for a language in C, there must be a

way to decide whether G i generates a given string.

Examples of classes which are decidable rewriting systems are the regular, context-free, and context- sensitive languages.

One can show that classes of decidable rewriting systems can be identified in the limit from complete information sources. The inference system simply chooses the first grammar in the enumeration which can generate M1 the known strings marked "+" and none of the known strings marked %'. Let G i be the first grammar in the enumeration for the target language to be learned. Such a G i must appear in the enumeration by the definition of decidable rewriting systems. Since every predecessor of G i in the enumeration will differ from G i on some string, that predecessor will be eliminated from consideration when that string appears in the information source. So G i will be selected after all its predecessors are shown to be inadequate and the inference system will never again change its guess.

Finally, one can prove that the reeursivcly enumcrable sets are not identifiable in the limit from com- plete information. All of these results are proven by Gold [67] and are summarized in Figure 5.

Recumively Enumerable Not learnable Not learnable Sets

Decidable Rewriting X Not learnable Systems

Super Finite X Not learnable Sets

The Finite Sets X X

Finite Glass X X of Finite Sets

Complete Information Positive Information

Figure 5. Learnability summary for identification in the limit. 152

Strong Approachability

Feldman [72] has given a weaker definition of learnabiIity called strong approachability. This definition requires that

(i) for every string y in the target language there is a time after which every guessed

language includes y,

(2) for each grammar which does not generate the target language there is a time after

which it will not be selected, and

(3) there is a correct grammar which will be guessed an infinite number of (possibly noncon-

secutive) times.

This definition is sufficiently weak that the system could select the wrong grammar most of the time and still be said to have learned. It, however, does include the essential elements of convergence. Feldman showed that the reeurMvcly enumerablc sets are strongly approachable from positive information.

A summary of all three levels of learnability and the associated results appears Figure 6.

Recursively Enumerable XX Sets

Decidable Rewriting XX X Systems

Super Finite X X X Sets

Finite Sets X X X X

Finite Cla~s X X X X X of Finite Sets C+ C+ C+

Strong Identifiability Finite Approachability in Limit Identifiability

Figure 6. Results summary for three levels of learnability. t53

An Algorithm for Grammatical Inference

Few practical algorithms for grammatical inference have appeared over the past decades. Even

though from a theoretical point of view many classes can be identified in the limit, most algorithms are so combinatorial that they cannot ordinarily be used. Two methodologies were developed with some capabili-

ties to deal with finite state languages (Biermann and Feldman [72]) and context free languages (Wharton

[77]). The first of these methods wilt be described here.

Suppose the following set of strings are known to be samples from a finite state language and the

task is to find its grammar:

A, AA, BA, BB, AAA, ABA, ABB, BAA, BBA.

Then one can construct the behavior tree shown in Figure 7. Each string is indicated on the tree by a node found by tracing the tree down from the top taking a left branch for each A and a right branch for each B.

Next a finite automaton is built from this tree. The methodology involves selection of an integer k

and constructing all subtrees found in Figure 7 of depth k. Choosing k=l in this example yields five types of subtrees as indicated in the figure, one corresponding to each of these sets of strings where A stands for the string of length 0.

1. {A}2. {A,A}3. {A,B}4. {A} 5. {}

These five subtrees then become states tbr a finite state machine as shown in Figure 8. Then a transition labeled x is placed in the finite state machine from state s i to state sj whenever the subtree in Figure 7 corresponding to s i has a transition labeled x to the subtree corresponding to sj. The set of all such transi-

tions are shown in Figure 8. The initial state of the automaton corresponds to the top subtree in Figure 7.

Ali states with A in their corresponding subtrees are labeled final states.

The grammar can be constructed from the automaton by adding for each transition from s i to sj on input x a rule v i --* x vj to the grammar (plus a rule v~ --* x if sj is final). The resulting grammar in this example is given below. 154 'I !

5

Figure 7. The behavior graph for the unknown finite state language.

6 A

5© I ~.j ...... :_A,B >

A 6

Figure 8. A finite state accept,or for the unknown language. 155

v 1 --~ A v 2

v I --*A

Vl --~ S vs

v 2 --~ A v 2

v~ --~ A

v2-, By 3

v2 --" A v4

v 2 --* B v 5

vs -'* A v 2

vs --*A

V3 "-'+ n V2

vs ---~B

v 3 -* A Y4

V3 --'~ B v 4

There are clearly redundant rules in this construction but the current concern is on how to build the grammar.

The resulting grammar depends on the size of k. If k is large, the inferred language will be small and may even be finite. If k is small, the inferred language will be large. Biermann and Feldman [72] give a method of converging on the correct value of k. It adjusts k to obtain perfect behavior on all "short" strings.

Conclusion

Research in grammatical inference has provided a mathematical model of learning and inference behaviors with definitions of learnability, convergence theorems, and many results concerning the Iearna- bitity or lack thereof of various classes of behavior (See Angluin and Smith [83]). Thus the field has

tremendous theoretical importance in that it provides models and tools that can be applied in a variety of

situations. 156

On the other hand, the field has led to few practical methods because of astronomical computations involved in most of the algorithms. The reason tbr this high cost is that example strings from a language provide no information about how the recognition computation is done. Thus the inference algorithms are reduced to enumerative methods for finding grammars. It was eventually realized that additional informa- tion about how to do computations would be needed if computational mechanisms such as grammars were to be synthesized. Later research thus tended to focus on the synthesis of programs where the inference environment often provides trace information that substantially aids in the synthesis. The following three sections describe approaches to the inference of programs where trace information can be used in the syn- thesis.

1V INFERRING PROGRAMS FROM COMPUTATION TRACES

Introduction: The Trainable Turlng Machine

In many environments, trace information is available showing how a computation is done. It is not necessary to learn the grammar or program for doing a computation from only input-output behaviors.

The trainable Turing machine described by Biermann [72] provides an example of this. This machine has a

training mode in which the user can push the read-write head up and down the tape and indicate the

desired computation by doing examples by hand. Then it has a computing mode in which the system acts like a normal Turing machine and uses a finite-state controller which was automatically synthesized on the basis of the hand examples. One can show that any Turing machine can be synthesized on the basis of

such examples and that relatively few examples are needed in many practical situations. However, the

computation cost for automatically synthesizing the finite-state controller can be high.

The Flowchart Generation Methodology

One can see how to automatically generate a Turing machine from example computations by study-

ing an example. Suppose it is desired to sort a sequence of A's and B's on a Turing machine tape so that

all the A's precede the B's. The method will be to move the head of the Turing machine right until the

first A is found. Then the head will move to the beginning of the tape and place the newly found A. Next

it moves right looking for a second A which is moved left to be adjacent to the first A and so forth. An t57 example of this computation appears below with the Turing machine head position being indicated by an underline.

BAA

BAA

BBA

BBA

BB.BA

AB__A

AB&

ABB

ABB

ABB

AABB

AAB _

The task is to automatically create a Turing machine that will do this calculation and hopefully, all other

"similar" calculations.

A notation is needed to represent a single head operation. Triples will be used that give, respec- tively, the symbol read from the tape, the symbol printed, and the subsequent head movement, left or right. Thus the triple ABR means that an A was read, a B was printed, and the head then moved right.

The above twelve steps thus correspond to the following twelve head movements:

BBR

ABL

BBL

(blank) (blank) R

BAR

BBR

ABL 158

BBL

AAR

BAR

BBR (halt)

A finite state controller for the Turing machine is needed which will direct these head movements.

The construction of the finite-state controller is shown in Figure 9. Initially the Turing machine has only one state and no transitions as in Figure 9(a). But the first head movement in the computation is

BBR so a transition is added to account for it in (b). This means that if the machine is in state 1 and reads a B, it will print a B, move right, and go to state 1. The second desired movement is ABL which could also involve a transition from state 1 to state 1. Unfortunately the third step is BBL which contrad- icts the first step. It is not possible to be in state 1 and expect a B input to yield a move left because a transition already exists that directs a B input to yield a move right. Therefore the ABL transition must go from state 1 to state 2. (See (c)). The BBL transition can then proceed from state 2 to anywhere. (It is directed to state t unless that fails, state 2 unless that fails, etc.) The BBL transition is directed to state 1 as shown in (d). The fourth head movement is (blank)(blank)R which cannot go to states 1 or 2 because it is followed by a BAR step. Both states 1 and 2 have contradictory actions on a B input, BBR and BBL. So the fourth movement is indicated on a transition from state 1 to state 3. (See (e).) Continu- ing this series of arguments, the final flow is completed in Figure 90).

This construction can be automated and is guaranteed to produce a Turing machine capable of exe- cuting the given example. The interesting point, is that this Turing machine will sort any tape of A's and

B's no matter their order or the length of the tape. The basic algorithm is given in Biermann [72] and a greatly refined version appears in Biermaim et al. [75].

Once this methodology was discovered, it was applied to numerous problems. Biermann and Krish- naswamy [76] built a trainable desk calculator that was driven by a light pen at a display terminal.

Waterman et al. [84] used the idea in the construction of an adaptive programmer's helper for computing systems. Fink and Biermann [86] used the technique to automaticMly construct diMogue models from 159

(a) @

BBR (b)

8BR

('~ BBR (d) - ~ SBL BBR (e)

(~1 ~BR BBL

BBR BIIL OABLQ -~-Rf~ (h) ~xx:SC:~ BBR I~BL

Figure 9.. Constructing a Turing machine controller. 160 human-machine conversations. The procedure appears to be a fundamental mechanism for procedure acquisition which will have continuing importance in the coming years.

V CONSTRUCTING LISP PROGRAMS FROM EXAMPLE INPUT-OUTPUT

BEHAVIORS

Introduction

During the 1970's, a number of researchers examined the problem of synthesizing LISP code from examples. See, for examples, Biermann [78], Biermann and Smith [79], Hardy [75], Jouannaud and

Kodratoff [79], Smith [84], and Summers [77]. Synthesis of LISP from examples is marginally feasible because the structures of the input and output lists yield substantial trace information. One of the syn- thesis methodologies will be described here~ the synthesis of LISP programs from recurrence relations as developed by Summers [77]. Other methodologies are surveyed in Biermann et al. [84] and in Smith [84].

LISP Synthesis from Recurrence Relations

Suppose it is desired to create automatically a LISP program that will convert input

((A B) (C D) (E F)) to (B D F).

That is, the target program is to collect the second elements on a series of lists. The Summers methodol- ogy requires the user to display the loop pattern in the target program in a series of input-output exam- ples.

NIL ~ NIL

((A B)) --* (B) ((A B)(C D))-~ (B D)

((A B)(C D)(E F))-* (B D F)

The synthesis of this program will be explained following the treatment of Smith [84].

The first step involves writing the outputs in terms of their respective inputs using LISP car, edr, and cons functions.

fl(x) = NIL 161

f2(x) = cons (cadar (x) NIL)

f3(x) = co~ (cadar (x)

cons (cadadr (x) NIL))

f~(x) = cons (cadar (x)

cons (cadadr (X)

cons (cadaddr (x) N~L)))

In fact, a program for achieving the observed example is

F(x) = (cond (pi(x) fl(x))

(pe(x) f2(x))

(Pa(X) fa(x))

(p4(x) f4(x))) where the pi's are predicates which select the correct fl to execute in each case. In fact, Summers gives a simple predicate generating algorithm which finds the pi's.

pi(x) = atom (X) p~(x) = atom (car (x))

p~(x) = atom (cddr (x))

p~(x) = atom (cdadr (x))

Program synthesis then involves finding a way to roll the straight line code for F into a loop.

So the methodology tries to find a recurrence relation which relates each fl to previous f/s where j

fl(x) = Nm

f2(x) = cons (cadar (x) f,(cdr (x)))

f3(x) = cons (cadar (x) f2(cdr (x)))

f,(x) = cons (cadar (x) f3(cdr (x)))

So the recurrence relation is easily seen to be

fi(x) = cons (cad~ (x) fi_,(cdr (x))) for i=2,3,4. Similarly a recurrence can be found for the pi's. 162

pi(x) ---- pl_t(cdr (x)) for i=2,3,4. The induction step then assumes that these recurrence relations hold for all i>1 and applies the Summers Basic Synthesis Theorem: If

pi(x) ..... pk(x), pk+.(x) = pn(b(x)) for n _> 1

fi(x) ..... fk(x), fk+n(x) = C (fn(b(x)), x) for n > 1 where b is a function of car's and cdr's and C is a cons structure that includes f~(b(x)) exactly once, then the function

F (X) = (¢ond (pl(x) fl(x)) (p~(x) f~(x))

--- ) can be computed by the following recursive program.

F (x) = (cond (p,(x) fl(x)) (p~(~) f~(x))

(pk(x) fk(x))

(T C (F(b(x)), x)))

In this example, b=cdr, k~l, and

C (F(b(x)), x) = cons (cadar (x) F (cdr (x))).

So the synthesized program is

F (x) = (¢ond (atom (x) NIL)

(T cons (eadar (x) F (¢dr (x)))))

In summary, the Summers synthesis methodology begins with a carefully constructed set of exam- ples which illustrate the desired recursive computation, The pi's and fi's are constructed for the given 163 examples and then each Pi and fi is written in terms of Pi-k and fi-k- Recurrence relations are then derived and the synthesis theorem is applied to give the finM program. This system is able to efficiently generate many useful single loop programs.

Biermann [78] applied the flowchart synthesis methodology described in the previous section to the

LISP synthesis problem. This leads to an algorithm for the synthesis of regular LISP programs from exam- pies. The regular LISP programs are analogous to finite start automata and allow arbitrarily complicated flow of control. This system does not require the user to carefully construct examples as was done by

Summers but it also requires more execution time for a synthesis.

VI SYNTHESIZING PROLOG PROGRAMS FROM EXAMPLES

Shapiro [83] has developed a methodology for synthesizing PROLOG programs from examples. A flowchart for the system appears in Figure 10. Its operation will be illustrated by showing how it con- structs the program for the function member (X,Y) which yields true if X is a member of list Y and false otherwise.

The system functions as follows. The user furnishes example facts illustrated at the left side of Fig- ure 10. These facts include positive information showing desired behavior for the target program and negative information showing undesired behavior. Thus in the member example, a user might supply the facts "member (a,[a,b]) is true" and "member (c,[a,b]) is false". The system at all times maintains a PRO-

LOG program as shown at the right and it continuously compares the current version of the PROLOG program with the known collection of user-supplied facts. If the current program is not satisfactory because of lack of correctness with respect to some fact, the program is modified either by adding new clauses from the generator at the top or by throwing away existing clauses.

Normal operation of the system thus involves continuously debugging the existing PROLOG pro- gram with respect to the known facts. Three kinds of errors may occur: (1) The program may compute a result which is undesired, an incorrect answer. (2) The program may be unable to compute a desired answer. (3) The program may not terminate. 164

In the first kind of error, the system simulates the incorrect computation and continuously queries the user and the data base to check that each step is correct. When a PROLOG clause is found that com- putes an incorrect result from correct premises, that clause is removed from the program as indicated at the bottom of Figure 10. In the second type of error, the system again simulates the computation but this time it will fail because some needed result was not computed. This indicates an additional clause is required and the enumerator at the top is run until the needed clause is found. In the third type of error, the simulator halts after a prespecified limit on computation size has been exceeded and then the system searches for causes of the suspected loop. The system searches for places where the same computation state may be reentered more than once and it may also query the user concerning violations of a well founded ordering needed for termination. The nontermination will be caused by some clause which com- putes an undesired result and the debugging procedure will discover that clause and remove it.

Shapiro experimented with many different types of clause generators and associated with each was a class of synthesizable programs. He also developed a scheme for improving efficiency by avoiding the gen- eration of many clauses which are covered by other clauses previously shown to be inadequate. At the time of system invocation, the user is asked to furnish the names of predicates appropriate to the current problem and to indicate which predicates may appear on the right sides of PROLOG clauses.

Proceeding with the synthesis of the member program, the user first indicates that "member (_~_)" is an appropriate predicate and that it can appear on the right hand side of rules. The system begins with the empty program and debugs it with respect to given facts. If the user supplies the fact "member (a,[a]) is true", the system will discover its current program is incomplete, an error of type (2) listed above.

It will be assumed for the purposes of this treatment that the generator will create the following clauses in the order given.

member (X,Y) *- true

member (X,[XlZ]) ~- true

member (X,[YIZ]) *-- member (X,Z)

member (X,Y) ~- member (Y,X)

etc. 165

ill rul Generator for al~ posslble clauses

Get onother clause "' I Program I ii ill .'~f~.,~ Incomplete

l i i i i|iii 1 Facts J Monltor PROLOG L Program I ...... Interpreter I I Nonterminatlon debugglng Drop clause

1 Drop the offending clause i ii answer

Away

Figure 10 The Sh~pTrosynthes~s algorlthm 166

The notation [XIY~ stands for the list with heazt X and tail Y. The first call to the generator would then yield the current program

{ member (X,Y) ~- true } which covers the given example.

Next suppose the user provides the fact that "member (a,[b]) is false". Here the system would dis- cover a type (1) error and discard the single clause in the current program. This means that member

(a,[a]) is no longer handled causing a type (2) error and another call to the generator.

{ member (X,~X[Z]) ~ true }

This program satisfies both known facts.

Again the user may supply a fact: "member (a,[b,a]) is true". So another type (2) error results in an additional clause generation and the final program.

{ member (X,[XIZ]) *- true,

member (X,[YIZ]) *- member (X,Z) }

Shapiro showed his system to be capable of generating a variety of programs and compared it to various other systems. For example, his system solved the following problem posed by Biermann [78]:

Construct a program to find the first elements of lists in a list of atoms and lists. Thus the program should be able to input [a,[b],c,[d],[e],f] and compute the result [b,d,e]. Shapiro's system needed 25 fa~ts to solve this problem and constructed the following program after 38 seconds of computing

{ heads ([ 1,[ ]) *-- true,

heads ([[XlY]IZ],[XIW]) ~- heads (Z,W),

heads ([X/Y],Z) *-- atom (X), heads (Y,Z) }

Biermann's regular LISP synthesis system was able to create a solution for this problem using only the single example given above. However, its execution time was approximately one half hour.

V. CONCLUSION

Computer science has historically required programmers of systems to anticipate every possible behavior that could be desired and to program in advance M1 the knowledge and mechanisms needed to achieve it. Unfortunately, it has been found that such extensive and explicit programming is expensive 167 and it still, in many cases, does not achieve the range of behaviors that might be needed. The only alter- native is to have the machines program themselves to acquire the knowledge they need to function satis- factorily. This chapter has described many mechanisms for machine learning and provides an introduc- tion to the field. Additional information can be found in the references and in the textbook on learning edited by Michalski et al. /88].

REFERENQES

{1] D. Angluin and C. Smith [1983], "Inductive inference: theory and methods", ACM Computhag Sur-

veys, Vol. 15.

[2] A. Biermann and J. Feldman [1972], "On the synthesis of finite-state machines from samples of their

behavior", IEEE Trans. on Computers, Vol. C-21.

[3] A. Biermann [1972], "On the inference of Turing machines from sample computations", Artificial

Intelligence, Vol. 3.

[4] A. Biermann, R. Banm, and F. Petry [1975], "Speeding up the synthesis of programs from traces",

IEEE Trans. on Computers, Vol. C-24.

[5] A. Biermann and R. Krishnas~vamy [1976], "Constructing programs from example computations",

IEEE Trans. on Software Engineering, Vol. SE-2.

[6] A. Biermann [1978], "The inference of regular LISP programs from examples", IEEE Trans. on Sys-

tems, Man, and Cybernetics, Vol. SMC-8.

[7] A. Biermann and D. Smith [1979], "A production rule mechanism for generating LISP code", IEEE

Tran6. on Systems, Man, and Cybernetics, Vol. SMC-9.

[8] A. Biermann, J. Fairfield, and T. Bares [1982], "Signature table systems and learning", IEEE Tran-

sactions .on Systems, Man and Cybernetics, Vol. SMC-12, No. 5.

[91 A. Biermann, G. Guiho, and ¥. Kodratoff, (Eds.) [1984], Automatic Program Construction Techni-

quuues, Macmillan Publishing Co., N.Y.

[10] J. Feldman [1972], "Some decidability results in grammatical inference", Information and Control,

Vol. 20.

[11] N. Findler, Ed. [1979], A88oeiative Networks, Academic Press, N.Y. 168

[12] P. Fink and A. Biermann [1988], "The correction of ill-formed input using history-based expectation

with applications to speech understanding", to appear.

[13t K.S. Fu [1975], Syntactic Methods in Pattern Recognition, Academic Press, N.Y.

[14] M. Gold [1969], "Language identification in the limit", Information and Control, Vol. 10.

[15] S. Hardy [1975], "Synthesis of LISP programs from examples", Proc. Fourth International Joint

Conf. on Artificial Intelligence.

[16] J.P. Jouannaud and Y. Kodratoff [1979], "Characterization of a class of functions synthesized from

examples by a Summers-like method", Proc. Sixth International Joint Conference on Artificial Intel-

ligence.

[17] S. Mamrak and P. Amer [1978], "Estimation of run times using signature table analysis", NBS Spe-

cial Publication 500-14, Fourteenth Computing Performance Evaluation User's Group, Boston,

Mass., Oct., 1978.

[18] R.S. Michalski [1980], "Pattern recognition as rule-guided inductive inference", IEEE Trans. on Pat-

tern Analysis and Machine Intelligence.

[19] R.S. Miehalski, J.G. Carbonell, T.M. Mitchell, [1983], Machine Learning, Tioga Publishing Company.

[20] M. Minsky, Ed. [1968], Semantic Informatioq Processing, M.I.T. Press, Cambridge, Mass.

[21] M. Minsky and S. Papert [1969], Perceptrons, M.I.T. Press, Cambridge, Mass.

[22] N. Nilsson [1965], Learning Machines, McGraw Hill.

[23] C. Page [1977], "Heuristics for signature table analysis as a pattern recognition technique", IEEE

Trans. on Systems, Man, and Cybernetics, Vol. SMC-7.

[24] A. Samuel [1959], "Some studies in machine learning using the game of checkers", IBM Journal of

Research and Development.

[25] A. Samuel [1967], "Some studies in machine learning using the game of checkers, II", IBM Journal of

Research and Development.

[26] E.Y. Shapiro [1983], Algorithmic Program Debugging, M.I.T. Press, Cambridge, Mass.

[27] D. Smith [1984], "The synthesis of LISP programs from examples: a survey", in A. Biermann, G.

Guiho, Y. Kodratoff (Eds.), Automatic Program Construction Techniques, Macmillan Publishing Co.,

1984. 169

[28] M. Smith [1973], "A learning program which plays partnership dominoes", Communications of the

ACM, Vol. 16.

[29] P. Summers [1977], "A methodology for LISP program construction from examples", Journal ACM,

Voh 24.

[30] T. Truscott [1979], '"The Duke checker program", Journal of Recreational Mathematics, Voh 12.

[31] L. Valiant [1984], "A theory of the learnable", Communications of the ACM, VoL 27.

[32] D. Waterman, W. Faught, P. Klahr, S. Rosenschein, and R. Wesson [1984], "Design issues for exem-

plary programming", in A~ Biermann, G. Guiho, Y. Kodratoff (Eds.), Automatic Program Construc-

tion Techniques, Macmillan Publishing Co., N.Y. 1984.

[33] R. Wharton [1977], "Grammar enumeration and inference", Information and Control, Voh 33. METHODS OF AUTOMATED REASONING

A tutorial

Walfgang Bibel

Technische Universitfit M6nchen

ABSTRACT

This chapter introduces into various aspects and methods of the formalization and automation of processes involved in performing inferences. It views automated inferencing as a machine-oriented simulation of human reasoning, In this sense classical deductive methods for first-order logic like resolution and the connection method are introduced as a derived form of natural deduction. The wide range of phenomena known as non-monotonic reasoning is represented by a spectrum of technical approaches ranging from the closed-world assumption for data bases to the various forms of circumscription. Meta-reasoning is treated as a particu- larly important technique for modeling many significant features of reasoning including self- reference. Various techniques of reasoning about uncertainty are presented that have become particularly important in knowledge-based systems applications. Many other methods and techniques (like reasoning with time involved) could only briefly - if at all - be mentioned. 172

GONTENTS

INTRODUCTION

I. NATURAL AND AUTOMATED DEDUCTION

2. NON-MONOTONIC REASONING

2.1 A formalism for data bases 2.2 Negation as failure 2.3 Circumscription

2.4 Inferential minimization with circumscription 2.5 Other approaches to inferential minimization

3. META-REASONING

3.1 Language and meta-language

3.2 Application to default reasoning 3.3 Self-reference

3,4 Reasoning about knowledge and belief 3.5 Expressing control on the recta-level

4. REASONING ABOUT UNCERTAINTY 4. t Bayesian inference 4.2 The Dempster-Shafer theory of evidence

4.3 Fuzzy logic 4.4 Performance approaches 4,5 Engineering approaches

5. SUMMARY AND CONCLUSIONS

REFERENCES 173

INTRODUCTION

There has never been given a widely accepted definition of intelligence that both accounts for our everyday use of this notion and at the same time yields a precise and formal notion. This is probably because intelligence is a fuzzy notion in everyday use comprising various more precise notions at the same time which until now have not been elaborated. In this situation we have no choice other than continuing to talk about intelligence in an informal and intuitive way.

Taking such an intuitive view it seems that we associate with intelligence at least the following capabilities. A person without any knowledge would never be called intelligent; hence the capability to dispose of a certain amount of knowledge is one fundamental aspect of intelli- gence. However , a person with all the entries in the Webster at immediate disposal, but this being the only capability, would still not be regarded as truly intelligent because we also expect the capability for solving problems in changing environments from an intelligent being. For reasons that will be discussed in section 5 of this paper, problem solving is considered a special form of reasoning. With this understanding we thus can say that the capability of reasoning is another fundamental aspect of intelligence.

Intelligence has more such aspects. For instance, the speed with which the aforementioned capabilities are performed certainly plays an important although complex role in our intuitive understanding. In view of the computer systems that we actually have in mind, we integrate this aspect into the previous two ones as technical issues. An impressive capability for learning of course is part of our understanding of intelligence as well. But learning can be understood as a sort of problem solving and thus is taken already into account by the previous two funda- mental aspects. Further an intelligent being must have the capability for communication with others; but we might prefer to combine this capability with our understanding of intelligence only insofar as the inherent problem solving and reasoning processes are concerned.

We will ngt continue this analysis any further but rather draw already at this point the conclu- sion that, under the views taken in our analysis so far, there are essentially two fundamental aspects of intelligence, one is knowledge and the other reasoning. Let us take this as a working thesis claiming that any further aspects somehow can be understood in terms of these two as we have just discussed.

The focus of this tutorial is on techniques that lend themselves to an automatic treatment. In this context we prefer to substitute the notion of knowledge by that of a knowledge base and the notion of reasoning by that of inference as a matter of notational convention. In this res- tricted context our thesis translates into the architectural concept that any intelligent (com- puter) system may be viewed as basically consisting of two fundamental components, the knowledge base and the inference component. The two are not independent, of course, since inferencing has to take the representational structure of the knowledge base into account and 174 this structure in turn heavily influences the performance of inference.

The topic of this paper, then, is a treatment of various important techniques in use for the one of these two components, viz. the inference component. Because of the interdependence just mentioned this will occasionally require some discussion of issues of knowledge representa- tion as well which will be restricted to a minimum, however, since they are extensively dis- cussed in the contribution by J.P. Delgrande and J. Mylopoulos within this volume. Hence our topic is a rather restricted one; none the less it is one of central importance for Artificial Intelligence (AI) as the previous discussion should have demonstrated.

To some extent the distinction between knowledge and inference may be questioned, since inference may be regarded as a sort of knowledge as well. Specifically, the capacity for inferencing derives from a knowledge about how one can infer new knowledge from previous knowledge. Under this view it might be classified as meta-knowledge that provides information about the relation of different pieces of object-level knowledge. Never the less the special way, how this particular kind of knowledge is used in knowledge-based systems, justifies the distinction we made.

A confusingly rich variety of methods and techniques for inferencing is known today. It ranges from exact mathematical theorem proving to the speculative conclusions of a stockbroker in one dimension, and from the human forms to sophisticated machine versions in another. An exhaustive treatment is therefore beyond the limits available within this volume. Yet we make an attempt to provide the reader with a feel for most of the aspects of inferencing. At the same time we try to present the different approaches as far as possible in a uniform way.

The paper begins in section 1 with classical first-order reasoning. In particular, we present this form of reasoning as a model of human mathematical reasoning following Gentzen with his calculus NK. On this basis the question is pursued in some detail how a deduction may be determined for a given formula. This leads us to a more technical version of Gentzen's calculus, from which we then derive the idea for the connection method in Automated Theorem Proving (ATP). This way of treatment and selection of topics is meant to comple- ment the chapter by Stickel in the same volume.

Section 2 is devoted to non-monotonic reasoning which is characterized by the following two features. On the one hand a problem description is always to be seen in the context of addi- tional common-sense knowledge that is assumed by default. On the other hand the resulting complete description is to be understood in some sense in a minimal way. Our emphasis is on this latter aspect. There are several techniques for achieving this kind of minimality. We dis- cuss the techniques of predicate completion in relational data bases and of negation-by-failure in PROLOG in some detail. Then we provide an introduction to the circumscription approach along with an illustration of its use for common-sense reasoning. Finally, we briefly review a number of other approaches such as default reasoning and reasoning that tolerates inconsisten- cies in the knowledge base. 175

Some of these approaches to non-monotonic reasoning demonstrate the need for a distinction of the reasoning on the object level from the one on the meta-level (or recta-recta-level, etc.). This important topic is treated in section 3. In particular, we explain the distinction between these various levels of languages and explain how one can technically amalgamate more than one, such levels into a single one. This may then directly be applied to non-monotonic reason- ing, to expressing control knowledge explicitly in a system (such as PROLOG), and to other important topics. We also present a recent solution that allows to express self-reference within first-order logic which has an important application in the reasoning about knowledge and belief. Other approaches to this latter area are briefly discussed as well.

The next major topic is reasoning about uncertainty in section 4. This is of a particular actu- ality since many knowledge-based systems necessarily have to cope with the uncertainty of the available information. The most widely used approach to these phenomena is based on Bayes' theorem; but we also point out a number of problems that have been experienced with this method. One of these problems have initiated the development of a related approach based on the Dempster-Shafer theory of evidence. In addition to that we conclude the presentation of such probabilistic approaches with a discussion of fuzzy logic that has been developed from fuzzy set theory. As a contrast the section concludes with a review of non-probabilistic tech- niques to deal with uncertainty which might be regarded more in line with AI methodologies. For instance we mention the plausible reasoning technique used in the system Ponderosa, the technique based on the model of endorsement, engineering techniques and others.

In the final section we fill the remaining gaps of other forms of reasoning by briefly address- ing some of them. This way we summarize the whole presentation. Its importance for many applications is pointed out. And finally we give a view of how a complex reasoning system comprising all these features eventually might look like.

1. NATURAL DEDUCTION

Human beings draw inferences from what they know, that is, they explicate new pieces of knowledge from previous ones. For instance, if the original knowledge consists of the two sen- tences

KI: Socrates is a man

K2: All men are mortal then by way of inference anyone would conclude that

K0: Socrates is mortal although obviously this latter fact is not explicitly given with K1 and K2 . This aspect of 176 human thinking has been observed and studied for more than 2000 years.

In illustrating this phenomenon the way we did, we have already taken into account a basic assumption. Namely, we might originally think of inferences being drawn from pieces of knowledge that are not necessarily represented in language form. But our assumption is that an appropriate representation in some language can be used without affecting the nature of this phenomenon.

Formally we may think of human inferencing as of a relation between pieces of knowledge like between K1 and K2 on one side and K0 on the other in the present example. Logi- cians usually denote this relation by the symbol ]= . Since our discussion at this point still is concerned with the cognitive aspect of human inferencing, let us emphasize this aspect by adding the subscript h to this relation, i.e. t% • Thus in our example the relation between the pieces of knowledge may be expressed by

K1, K2 J% K0

On the basis of this analysis of the phenomenon of human inferencing we face the following fundamental problems in view of the topic of this paper.

1. How can we adequately represent knowledge in language form?

2. How can we define [~ so that it coincides with our experience?

3. How can we determine whether K [~ K' holds for any K and K' ?

The first question addresses the relation of language and its meaning (or its semantics). It is a question that has kept many philosophers busy for at least the last hundred years. Today these issues are treated in the areas of natural language semantics and model theory. So this question is by far a non-trivial one. We will meet it again at several occasions later in the paper. In order to avoid its complications for the beginning, we rely on the solution offered by the language of first-order logic with its well-defined semantics until we discuss some of the complications in later parts of the paper.

The second question of course is intimately related to the first one and with it is again decid- edly non-trivial. As long as we accept the traditional form of first-order logic, however, logic provides us with a well-defined solution that we denoted by [= before. But we will later have to account for a number of complications as mentioned for the first question. 177

^-t ^-E v-I v-E [A][B] A B A^B A^B A B AvB C C AAB A B A v B A v B C

V-I ¥-E 9-1 3-E [F} F Vx F F{xkt} 3c F C Vc F ~ qx F C

-.-I -,-E ~-I .-E [A] [A] B A A-*B F A F A-, B B ~A F D

Inferences according to ¥-I and 3-E are subject to a condition on the variable c

Figure 1. The inference figure schemes of the calculus NK in the present section we will now discuss the third question in some detaiJ on the basis of first-order logic. In other words, we now assume that ]% is modeled by the classical entail- ment relation ]= of first-order logic. There knowledge is represented by flrst-order formulas. ]= is defined in terms of truth values in most logic texts. For the purpose of automation we would prefer a more syntactic characterization. In standard logic texts we find many such syntactic characterizations. One is due to G. Gentzen which we are going to describe in the sequel.

With his calculus NK Gentzen tried to simulate the natural way of a mathematician's reason- ing. In particular, he observed that such reasoning starts from assumptions (or from previ- ously established results), and proceeds by a number of well-defined syntactic aries. He pre- ferred to present these rules in a tree-like form as inference figure schemes. Figure 1 shows all the schemes that establish the entire calculus NK. These schemes are to be understood in the following way.

For each logical symbol there is a rule that introduces (I) and one that eliminates (E) it. For instance, consider the first rule ^-I introducing a conjunction symbol ^ . It says, if there is a derivation for the formula A and one for B in NK, then the derivation may be extended to establish the formula A ^ B . As we may see there are two different versions of the rule ^-E which eliminates a conjunction symbol. The same holds for the rule v-I . F{x\t} denotes the substitution of x by t in F. t78

2: Vy Pay Pab ¥-E 3x Pxb 3-I 1: 3a Vy Pay Vb 3x Pxb V-I Vb 3x Pxb 3-E~ 3a Vy Pay --, Yb 3x Pxb -~-Iz

Figure 2, A derivation in the calculus NK

As we mentioned NK operates with assumptions that are stated at the beginning of a chain of reasoning. Some of the rules allow the transition of the premises to the conclusion only under certain assumptions. In the rules such assumptions are represented as formulas in brackets like the formulas A and B in the scheme v-E . In detail this scheme says: if there is a derivation in NK of the formula A v B , further a derivation of any formula C that is sub- ject to an assumption A made initially in this derivation, and finally a derivation of the same formula C but now subject to an assumption B , then we may infer C which then is not subject to both assumptions A and B any longer.

Figure 2 shows a derivation in NK that starts with two assumptions. If we interpret Pab - which is short for P(a,b) - as "person b earns more than a deutschmarks', then assump- tion 1 states that there is a lower bound in the salaries in question, while assumption 2 states that a is such a lower bound, Under the assumption 2 the formula '¢b 3x Pxb (along with it predecessors in the derivation) is derived in an initial branch of the derivation. In the next step of the derivation, however, the dependency of this assumption is eliminated by way of an instance of rule 3-E . The reference to the particular assumption 2 is established in the fig- ure by way of the index 2 added to the name of the rule. The same happens with assumption 1 in the final step of the derivation in an analogue way, so that the final formula of the derivation is not subject to any assumption.

In order to demonstrate that such a formal derivation in NK actually mirrors a natural chain of reasoning, we translate the derivation from figure 2 into English text as follows: 'Suppose there is an a such that, for all y , Pay holds (assumption 1). Let a be such an a (assumption 2). Then, for all y , Pay holds; therefore, if b denotes an arbitrary object, then Pab holds (the step V-E). Thus there is an x , viz. a is such an object, such that Pxb holds. Since b was arbitrary, our result therefore holds for all objects, i.e. for all b there is an x such that Pxb holds (¥-I and 3-E~). This yields our assertion (-,-I1)-'

Without going into any further details we just note that there is a condition on the variable in two of the rules. In our example derivation, for instance, this condition requires that the all- quantified b in the formula resulting from the V-I inference must not occur in any assump- tion on which this formula depends, i.e. it must not occur in assumption 2 in this case; this 179 provision guarantees that b is indeed arbitrary as the text says. Similarly, the a from assumption 1 must not occur in the formula resulting from the 3-E inference. Since these variables apparently play a different rote than the ones like x and y in the derivation, we distinguish them with our notation that uses a, b, ... for the ones and x, y, ... for the oth- ers. As a final explanatory remark we mention that F denotes the logical constant 'false'.

The set of these rules defines a derivability relation among formulas that usually is denoted by t- • For instance, the end formula of the derivation from figure 2 is derivable in this sense, i.e. t- 3a Vy Pay -* Vb 3x Pxb . In general, [- is a binary relation between a set of assumptions and a formula. In the present example the set of assumptions under which this formula is derivable is empty and thus not written explicitly. The formula Vb 3x Pxb is derivable only in the context of the assumption 2, thus we have Vy Pay I- Vb 3x Pxb , and so forth.

One can prove that I= and I- both define the same relation, a result which is known as the soundness and completeness theorem for the calculus NK. The -*-rules in NK show that A [- B holds iff (i.e. if and only if) I- A -* B holds (known as the deduction theorem), for which reason we may restrict our attention to the latter case with no assumptions which simpli- fies the discussion of question 3 above.

As we already pointed out it was Gentzen's main intention with NK to introduce a calculus that closely simulates the natural way of human reasoning. In particular, NK contrasts from the so-called Hilbert-type calculi which define the notion of derivability in a different way.

These specify a set of formulas as axioms and allow only one rule of inference, viz. the rule --,-E well-known as the modus ponens. For instance, any formulas of the form A-,(B-*A) or of the form (A-,(B-*C))-,(A-*B)-*(A--,C) would be among the axioms although their validity does not seem to be obvious in all cases (such as the second one). This indicates why NK appears to be much more natural than calculi of the Hilbert type.

As an aside we mention that Gentzen has provided an intuitionistic version NJ of NK by deleting the rule ~-E. This shows that technically intuitionistic reasoning is not much dif- ferent from classical reasoning. The reader may find more on this topic in section 4 of Huet's contribution in this volume.

An additional advantage of NK in comparison with Hilbert-type calculi is the fact that it lends itself to an automation of the computation of I- in an easier way. This becomes more obvious if we proceed to a technical variant of NK that also was developed by Gentzen for the purpose of simplifying his consistency proofs for number theory. This variant is denoted by LK ("logistic calculi") and is known as Gentzen's sequent calculus. It is very easy to transform a derivation in NK into one in LK. For that purpose any formula B in the given 180

~Pab v 3y ~Pay v Pab v 3x Pxb (ax) 3y ~P, ay v Pab v 3x Pxb (9) 3y ~Pay v 3x Pxb (3) Ya 3y ~Pay v 3x Pxb (¥)

Va 3y ~Pay v Vb 3xPxb (V)

Figure 3. A derivation in the calculus GS derivation that depends on a number of assumptions A1,...,A~ is replaced by the sequent A1,...,An -, B.

Strictly speaking, the use of the implication sign -, takes place on a level different from the one of possible implication signs in the formula B or in the A's, namely on the meta-level. However, since the logical meaning of implication remains the same on any level, no addi- tional rules are required for it. Following Schfitte [Sch] we also interpret the comma in such a sequent as a conjunction, drop the redundant elimination rules from NK, restrict the tertium- non-datur (v-I) to literals and transform any formula to its negation normal form (no nega- tion signs except in literals and no implication signs). With all these modifications applied to NK we obtain a calculus GS (for Gentzen-Schfitte) for first-order logic defined below via its derivability relation [- . It would be boring to explain this transition in all technical details (see e.g. [Bi3]). Rather we will illustrate it with the example from figure 2 after stating the formal definition.

Def'milion. Inductive definition of the derivability relation 1- in GS for formulas in negation normal form.

(ax) [- G1 v Ph...t~ v G2 v ~Ph...tn v G3 ; that is, all formulas of this kind, which are called axioms, are derivable. Here and in the following rules the occurrence of the formu- las Gi , i=t,2,3, is optional.

(^) t- G1 v F1 v G2 and l- G1 v F2 v G2 implies t- G1 v F1AF2 v G2 ; thereby ^ is assumed to bind stronger than v .

(V) I- G1 v F v G2 implies I- G1 v gcF v G2 provided that c does not occur in G1 and G2. (3) 1- G1 v F{x\t} v 3xF v G2 implies l- G1 v 3xF v G2

The negation normal form of the last formula in the derivation of figure 2 is obtained by the following substitutions. The formula of the form A--,B is replaced by ~AvB ; then the negation sign is moved inward by replacing 43 with '¢~ , and then ~V with 39 , accord- ing to welt-known laws in first-order logic. The GS derivation of the resulting formula is shown in figure 3. 18t

Its first formula is an axiom according to (ax); viz. in this instance G1 does not occur, Ptlt2 is ~Pab , G2 is 3y~Pay , ~Ptlt2 is Pab , and G3 is the rest. Essentially it expresses the tertium-non-datur Pab v -,Pab which also may be read Pab ~ Pab . The next two steps introduce an existential assertion instead of the given term. The need for the occurrence of the existential formula before and after such a step arises from the possibility to combine more facts of the sort Pab (e.g. Pbb, Pcb, ...) within a single existential statement 3xPxb . The final two steps introduce an all assertion. Note that in both cases the quantified variable does not occur in other parts of the formula as required by the condition on c in (V). The example does not illustrate an application of the simple rule (^) that has two premises that are identical upto the two parts F1 and F2 .

Although we skipped many details that are usually discussed in a logic text in such a context, the reader might now be able to carry out derivations in either NK or GS. Of course this is also not the place to formally prove the fact that any formula derivable in GS is derivable in NK, and vice versa. Thus both calculi provide the same derivational power, while they differ in their naturalness and their conciseness. NK is more natural while GS is more concise and thus technically more transparent.

In order to determine whether [- F holds (question 3 above in its simplified form) for any formula F one would think of starting with F and try out any of the four rules of GS in a backward way in order to see which one of them might be the last in a derivation. This would yield premises for which we carry out the same process again, and so forth. So under this view we would be interested in the backward direction of these rules only. In this case one might prefer to state these rules in this backward direction from the very beginning. The calculus known as semantic tableaux by Beth [Bet] does exactly this, i.e. its rules include exactty those of GS read in a backward direction with all formulas negated (since proofs are established by contradiction). So the inclusion of the semantic tableaux into this discussion would not add any new aspects.

There is a further well-known result in logic which helps us to simplify our problem even further. This says that a formula is derivable iff its skolemized form is derivable. For any formula we obtain the (positively) skolemized form in the following way. Any- part in the for- mula (in negation normal form) of the form Va F[a] is replaced by F[f(xl .... ,xn)] . Thereby it is assumed that f is a function symbol not occurring elsewhere in the formula and that there are exactly n quantifiers 3xi , i=l,...,n, that precede Va in the formula. For instance, the skolemized form of the end formula in the derivation of figure 3 is 3y ~Pay v 3x Pxb , because both all-quantifiers are not preceded by any existential quantifier so that the Skolem function has zero arguments, i.e. it is a constant in each case (here denoted by the same letters a and b which do not occur anywhere else in the resultant formula). Similarly the skolemized form of the formula 3x ga 3y F[x,a,y] is 3x 3y F[x,f(x),y] ; and so forth. As a warning we mention that often skolemization is introduced (negatively) in the context of t82 a refutation rather than a proof system in which case the roles of the all- and existential quantifiers are exchanged.

If we restrict our attention to formulas in skolemized form then all-quantifiers never occur. Consequently, the rule (V) in GS becomes obsolete and thus can be ignored. In this case we furthermore can apply another well-known result and transform any given formula into its prenex form, again without affecting derivability. For instance, the prenex form of the end formula again from figure 2 after skolemization is 3y 3x (~Pay v Pxb) . Any such formula consists of a sequence of existential quantifiers, the prefix, followed by the matrix, a formula part that is purely propositional by nature and has no quantifiers. One may easily see that for the derivation of such a formula it can be assumed that the rules according to (^) precede all those according to (3) (in more general form known as Gentzen's Hauptsatz). How could one determine a derivation of that kind for any given formula in this special form?

Since we know that the final part of the derivation must consist of a finite number of applica- tions of nile (3) for each existential quantifier, we only have to determine these finite numbers along with the term t on the left side of this rule for each of its instances. Assume we would know this, then we could easily determine the first formula in this final part of the derivation which must then be derivable from axioms in a first part of the entire derivation by applications of the rule (^) only. Apparently this first part achieves nothing else than estab- lishing that this formula is a propositional tautology. In summary our task consists in deter- mining the number of applications of rule (3) and the respective terms, and in testing for tau- tologies.

The simplest solution for this task would be an exhaustive enumeration of the numbers and the possible terms together with an application of the tautology test for each resulting confi- guration. In the beginnings of ATP such an approach has in fact been pursued which became known as the British Museum Method. Obviously it is hopelessly inefficient. As the crucial idea of improvement Prawitz suggested in 1960 to exchange the sequence in the solu- tion for this task, namely t~ test for tautologies first and determine the numbers and terms by need only. All theorem proving methods today work along this basic idea; they only differ in the particular choice of the tautology testing method.

Let us illustrate this idea with our previous example 3y 3x (-Pay v Pxb) . Deletion of the quantifiers yields the matrix ~Pay v Pxb . In order for this formula to become a tautology, x must be replaced by a and y by b . So this provides the information about the final part of the derivation for this example which consists of exactly one application of rule (3) for each existential quantifier, one with the term a , the other with b . The replacement of variables by terms as just illustrated is determined by a welt-known and fast process called unification for which more details may be found in the chapter by Stickel in this volume. If no appropriate replacement would have been found in our example then we would have taken into account a second copy corresponding to two applications of rule (3) for each quantifier 183 and thus would have considered the formula (~Pay v Pxb) v (~Pay' v Px'b) to be tested for tautology; and so forth.

To summarize once more, because rute (3) can be applied more than once, and, viewed in the backward direction, each time produces a new copy of the matrix, the tautology test possibly has to account for more than one such copy. Apparently one would first try one copy, taking others into consideration if this fails to yield a tautology. We are left with the question for an appropriate tautology test that includes unification. The one suggested by GS consists in a straightforward application of the inverse of rule (A) along with a simple test for (ax) on the resulting formulas, and doing this over and over again. This is pretty redundant since the inverse of rule (^) generates two out of one formula which share a great deal of information.

One way to avoid this redundancy consists in an extension of the axiom property to any tau- tology which renders rule (a) completely redundant as well (hence (3) is the only rule that remains after these modifications on GS). Let us illustrate this with the matrix (~Pay ^ Qx) v Pxb v ~Qa that is slightly more complicated than our previous one. With one application of the inverse of rule (A) we would obtain two axioms according to (ax) after application of the same substitution as before. But now we define (ax') such that this matrix becomes an axiom itself. In order to give an intuitive idea of the property characterizing (ax') we use a different representation of this matrix, viz. in real matrix form in a two-dimensional space. For that purpose we represent conjunctive parts top-clown and disjunctive parts left-right. So our matrix now reads

~Pay Pxb ~Q.a Q×

The columns in such a matrix are called ctauses. A path through the matrix is a set of literads that is obtained by selecting one literal from each clause. Intuitively one might think of trav- eling through the matrix along such a path. Our matrix has exactly two paths. Note that they correspond exactly to the two axioms obtained after application of the inverse of rule (ax) as described before. Two literals in a matrix are called a connection if they are contained in a path and share the same predicate symbol, one negated the other unnegated. Our matrix has exactly two connections as depicted in the following copy.

~Pay Pxb ~tQa 184

A set of connections is called spanning' for the matrix, if each path through the matrix con- tains one of them, as is the case in our present example. With our previous substitution ix\a, y\b} the literals in each connection become identical upto the negation sign in which case the connections (or the literals) are called complementary. With these notions we can now define (ax').

(ax') [- F for any formula F for which there is a spanning set of complementary connections.

There are powerful algorithms which test for this property which along with unification provide a convenient and comparatively efficient solution for our task 3. It takes one (or more) copies of the matrix of the given formula and tests for (ax') whereby substitutions are generated by need via unification. This whole approach is known as the Connection Method. Except for this brief outline we will not describe it in any further details since there is a more detailed expository overview in [Bi4] for readers ready to taste it in more but still limited details and a comprehensive treatment in [Bi3] for the truly committed readers.

With the way of development used in the outline above we wanted to emphasize the close rela- tionship of the connection method with the calculi of natural deduction. This is a very impor- tant feature for an interactive theorem proving environment. Namely we might think of a powerful machine-oriented prover based on the connection method inside the machine and the proofs (completed or partial) represented on the screen of a workstation in a human-oriented and natural way.

We would like to make the reader aware of the fact that for the purpose of explanation we have made a number of simplifications in our task that are justified in view of a correct solu- tion. However these simplifications (like skolemization, prenex form, etc.) do not necessarily contribute to a more efficient solution. This is to say that we have to omit the simplifications if we head towards a really smart solution [Bi3]. Unfortunately our task becomes then so complex by its very nature that a long experience is needed in order to be able to advance to these more challenging topics. On the other hand there is no other way to advance this field any further.

In a way the restriction to first-order logic might be regarded already as a very serious one since it seems to exclude any higher-order features. For this reason we mention that the con- nection method can be generalized to higher-order logic which has been carried out in section V.6. Because of the computational problems that arise in such a general logical framework a restriction might nevertheless be desirable. The results outlined in section 3.3 in the present paper might be of great interest also in this context.

As we said above the test for tautologies distinguishes the various theorem proving methods in use today. Solar we have discussed those based on the connection method. The most popular ones are those based on resolution, however. We witl not consider them at all here since they are extensively covered in Stickel's chapter in this volume. Resolution works on the basis of 185 the same simplifications that we used above. So the advances that we just talked about are important issues for resolution as well. Most likely they will be pursued further in the context of the connection method because it is more transparent than resolution for such a purpose which is an important point in view of coping with the complexity of the task.

Let us, finally mention that all of the special topics discussed in the context of resolution in Sticket's chapter (such as theory resolution) similarly apply to the connection method. Most of them have been treated in [Bi3] under this viewpoint.

2. NON-MONO'IK)NIC RF_,ASONING

At the beginning of section 1 we have considered human inferences as a relation ]~ between pieces of knowledge. We have then taken [% to be the relation [= as defined for first- order logic and studied several syntactically defined versions [- of it. It has been pointed out in this context that this special choice will have to be reconsidered in a more detailed discus- sion of what we formulated as question 2. In the present section we enter this discussion.

Consider the following two pieces of knowledge:

- IBM produces ~mputers, or P(ibm,cps) - Daimler-Benz produces cars, or P(d-b,crs)

With no additional knowledge at hand, what would you answer being asked whether IBM pro- duces cars, i.e. P(ibm,crs) ? No, of course! In other words, it seems that

P(ibm,cps), P(d-b,crs) ]~ ~P(ibm,crs) holds despite nothing was stated in the premises about IBM with respect to cars. In fact, in first-order logic this is an invalid inference.

As another example (borrowed from McCarthy), suppose someone is hired to build a bird cage and doesn't put a top on it. Since anyone knows that birds can fly, no judge in the world would accept his excuse that it was not mentioned the bird could fly . On the other hand, if the bird for some reason could indeed not fly and thus money should not be wasted by putting a top on the cage it should have been said so. In other words, it seems that

BIRD(x) t% FLY(x) holds which again clearly is not valid in t'irst-order logic.

There are many more such cases where it seems that the human inference relation I% does not coincide with l= , the first-order one; but these two might do for a first discussion of this phenomenon. There are two different ways of approach. One is to acknowledge the 186 discrepancy and took out for an appropriate logic different from the first--order one. There is little doubt that we would have a hard time in bringing so different examples as the two above under a common logical framework that includes first-order logic as discussed in the previous section.

The other approach would be to assume hidden pieces of knowledge as additional premises in examples of this sort. In the first example this piece might be "and that's all which holds" in the sense that everything that is not explicitly stated to hold is assumed not to hold, a principle known as the dosed-world-assumption. P(ibm,crs) was not stated explicitly hence it is assumed not to hold, i.e. ~P(ibm,ers) is one among the pieces of this hidden knowledge. If we make it explicit by adding it to the inference above, we obtain a classical first-order infer- ence. P(ibm,cps), P(d-b,crs), ~P(ibm,crs) .... [% ~P(ibm,crs)

Similarly, if we add the hidden knowledge "birds can fly" to the premise in the second exam- ple, again the result is a classical first-order inference, viz. modus ponens.

BIRD(x), BIRD(y)-* FLY(y) I% FLY(x)

But note the difference; in the first case we have assumed that nothing except the stated facts holds (closed-world assumption) while in the second we added a fact (as common-sense knowledge). In combination we might be inclined to say that on the one hand there is a body of common sense knowledge that is tacitly assumed in any appropriate context, like the flying birds, while on the other hand no facts are taken into account except those stated explicitly or assumed as common sense knowledge.

At least for these examples, then, the second approach above appears to be much more con- vincing. So we learn that in certain cases humans draw inferences which involve tacit assumptions; they become first-order logic inferences once these assumptions are made explicit. Note that these assumptions are context dependent. For instance, in the first example the assumptions of course would not include ~P(ibm,crs) if P(ibm,crs) would have been among the explicitly stated pieces of knowledge. As a consequence, the conclusions drawn from a set of pieces of knowledge may change as we add additional information, a feature which is called non-monotonicity. First-order logic is monotonic in this sense. If a piece of knowledge K0 follows from some knowledge K1 , then K0 also follows from K1 enriched by additional knowledge K2 . In symbols, K1 [= K0 implies K1, K2 I= K0 t87

Common sense reasoning in contrast seems to be non-monotonic as we have just noticed. But our examples also show us that this non-monotonicity occurs on the surface only. If the tacit assumptions all are made explicit monotonicity is retained (since then the addition of a new fact like P(ibm,crs) also changes K1 so that the monotonicity rule does not apply at all). We will see, however, that it is not quite a trivial problem to handle the distinction between stated and assumed knowledge appropriately so that the formalism simulates usual common sense reasoning. Before entering the technical details let us summarize the different types of using non-monotonic reasoning following [Mc3].

1. Use as a communication convention by which a body of knowledge is tacitly assumed unless explicitly stated otherwise (like in the bird example above).

2. Use as a database or information storage convention by which only knowledge is taken into account (whatever this means in detail) that is explicitly stated or assumed by other conven- tions (like in the IBM example above).

3. Use as a n~e of conjecture for solving problems in the absence of complete information. For instance, if you want to catch a bird you better assume it can fly in spite of the many exceptions to the rule that birds normally fly.

4. Use as a representation of a policy. For instance, if a committee meeting has taken place always on Wednesday, the next meeting will again be Wednesday unless another decision is explicitly made.

5. Use as a very streamlined expression of probabilistic information when numerical probabili- ties are unobtainable. For instance, if you see a bird what might be the probability that it can fly? In order to calculate it one would need a sample space in the first place which usually is not available in such situations. Moreover, what purpose would it serve in the particular situation to know that this probability is exactly 97.4%. Or think of statements like "she is a young and pretty woman" as another example where the probabilistic treatment appears to be out of place.

6. Use in the form of auto-epistemic reasoning where we reason about our own state of knowledge. For instance, "I am sure I have no elder brother because if I had one I would know it" belongs to this type of reasoning.

7. Use in common-sense physics and psychology. For instance, we anticipate an object to continue in a straight line if nothing interferes with it.

This shows us that we are dealing here with a wide-spread phenomenon with a number of dif- ferent aspects. We will begin with the technical treatment of a very restricted case of usage 2 in the list above. 188

2.1. A formalimm for data bases

Our first example above has shown us that the phenomenon of non-monotonic reasoning occurs already in the case of a simple data base. Logically a data base has a very simple structure. So it might be helpful for the more complicated applications to study the issue first for this simple case.

Data bases are described in a relational language which is a first-order language with a finite number of (at least one) constants and predicate symbols, without function symbols, with equality, and with a set of simple types, that is a subset among the unary predicate symbols.

From a logical point of view the notion of a relational data base is defined in a model theoretic way as a triple (R,I,IC) where

1. R is a relational language,

2. I is an interpretation for R such that the constants in R are interpreted as mutually dif- ferent elements in the domain.

3. IC is a set of formulas of R , called integrity constraints, such that for each n-ary predi- cate symbol P distinct from = and from the simple types, IC must contain a formula of the form qxl...Yxn ( Pxl...x~ "* Plxl^ ... ^ P,x~ )

where the Pi are types, i=l,...,n, called the domains of P .

As early as 1969 [Gre] the members of the ATP (Automated Theorem Proving) community have preferred to think of data bases in a proof theoretic (rather than the previous model theoretic) way. Under this view "answering a query means proving a statement .... Thus theorem proving is fundamental for solving data base problems, a fact which is well known (but not very popular at present)" as I stated in 1976 [Bi2].

More recently the need for a more flexible data base management has been recognized. Attempts into such a direction faced a number of problems such as the treatment of disjunc- tive information, the semantics of null values, the incorporation of more world knowledge, and ]ast not least the non-monotonicity, which all seem to be due to limitations of the model theoretic view of data bases. For this reason the proof theoretic view has finally received the attention it deserves. It will now be briefly presented following [Re3].

A relational data base is defined in the proof theoretic way as a triple (R,T,tC) where R and IC are defined as before while T is a relational theory defined as follows. t89

1. T is a first order theory, i.e. a set of first-order formulas.

2. T contains the domain closure axioms Vx ( x=q v ... v x=c~ ) and the unique name axioms ~ ci = ck , i,k = 1,...,n, i

3. T cSntains the equality axioms

VX X=X reflexivity Vxy (x=y --, y=x) commutativity Vxyz (x=y ,,, y=z + x=z) transitivity Vxl...x.,yl...y,. (Pxl...xr, ^ xl=yl ^ ... ,,, x==y,, "" Pyl---y=) Leibniz' principle of substitution

4. T contains a set D of ground atomic formulas without equality, which might be con- sidered as the actual data base, along with the following completion axioms for any predicate P different from equality.

Vx~...x~ (Pxl..,x~ -~ xl=cnA...^x~=cl~v ... v xl=e~I^.,.AX==C,~) , whereby (ql,...,q,), ..-, (cr,,...c=) are all of the tuples such that P(qt,...,c=) is in D for some i in {i .... ,r}.

For instance, let D be the set {P(ibm,cps), P(d-b,crs)} from our previous example. Then there would be only a single completion axiom in this particular case, namely

Vx,xe (Pxix~ -* xi=ibm^x2=cps v xl=d-b^x2=crs) .

It minimizes the extension of the predicate P , that is it restricts the tuples for which P holds to those stated explicitly in the data base D as described above, for which reason this approach is also known as predicate completion. It is easy to see that from the relational theory obtained for this particular example ~P(ibm,crs) can be derived while P(ibm,crs) cannot. However, if we add P(ibm,crs) to D then P(ibm,crs) trivially can be derived from this new theory while ~P(ibm,crs) can nomore, because with this update the completion axiom changes to become

Vxtx2 (PxIxz -* xI=ibm^x2=cps v xl=d-bAxe=crs v xl=ibm^x2=crs) .

The completion axioms are context-dependent like the assumptions discussed further above. This way we achieve the non-monotonic behavior of human reasoning within first-order logic.

As we see, a classical theorem prover would now give the expected answer to any query to the data base D . This remains true if disjunctive information is present in D , if null values occur, or if more complex world knowledge is added. So this approach settles the kind of problems that are now under discussion in the data base community. Of course, a standard 190 theorem prover would not meet the requirements on efficiency that are standard in data base technology. Both techniques may be integrated, however, by compiling the prover's steps into data base techniques without changing the semantics wherever such techniques are applicable. Currently a solution is preferred that interfaces an existing data base system built with con- ventional technology with a theorem prover, for instance a PROLOG system.

2.2. Negation as failure

A conventional data base has a very poor logical structure, so poor even that this structure could be ignored by the data base community for decades. So the question naturally arises how the solution achieved by the completion axioms can be extended to more complex knowledge bases, say to PROLOG programs [GeG] to begin with. There in addition to the relational facts as in data bases we have to account for general PROLOG clauses that take the form of rules. As we noted in section 1 such rules allow the derivation of facts that were not stated initially in an explicit way. This suggests that a generalized closed-world-assumption takes into account derivable facts rather than stated ones. So we would say that any fact is assumed not to hold unless it is derivable from the explicitly stated knowledge. For the case of PROLOG this principle is known as negation as failure which we briefly review now.

Recall from Stickel's paper in this volume that PROLOG clauses are rules of the form H ,- GI^...^Gn where n > 0 , and the head H and the subgoals G1 are atomic formulas. Further the goal clause is of the same form but has no head (and at least one goal). This means that in pure PROLOG negation cannot directly be processed. Instead it is handled according to the principle just explained. That is, if a goal or subgoal has the form ~G (G atomic), then the PROLOG interpreter first attempts to prove G ; if this fails then -~G is established, otherwise it fails. This may be expressed as a PROLOG program in the following way.

~G *- G,/,fail ~G *- true

We may view this treatment in a different way. The clauses of a PROLOG program define the predicates occurring in the heads; but they do so with the if-halves of the full definition only that would include the only-if-halves as well. In [Cla] it is shown in detail that negation-as-failure amounts exactly to the effect that would be achieved if these only-if- halves of the clauses would be added to the program and a theorem prover for full first-order logic would then do the interpretation. A theoretically more comprehensive treatment is con- tained in [JLL]. Instead of presenting these results here in any detail, we simply illustrate that the same view can already be taken in our previous data base example. As a set of PROLOG clauses it reads 191

?(ibm,cps) ,-- P(d-b,crs) *-

Obviously, the same can be expressed equivalently in the following way.

P(xi,x2) ~- xl=ibm, x2=cps P(xi,xa) *- xz=d-b, x2=crs

which in turn is equivalent with the logical formula

P(xa,x2) *- xl=ibm^x2=cps v xI=d-b^x2=crs

As always in PROLOG the variables are to be interpreted as all-quantified ones. With this in mind a comparison of this formula with the completion axiom (from the definition of a rela- tional data base above) for this particular case shows that this axiom is in fact the only-if-half of this formula. In other words, the completion axioms achieve exactly the same effect for the simple case of a relational theory that is achieved by negation-as-failure for the more compli- cated case of Horn clause logic [She]. With this remark it is now also obvious that negation- as-failure is non-monotonic since our previous example applies here too.

We note a distinction, however, in the way of treatment. In relational theories we have added the completion axioms and then carried out a classical proof process. In PROLOG there is an evaluation being extracted from the behavior of the classical proof process. This evaluation logically takes place on one level higher than the level of the proof process itself, that is on the meta-level. We will come back to such a combination of object-level and recta-level proofs in the sections 2.5 and 3 of this paper.

There is yet another way of viewing the negation-as-failure approach, viz. the semantic one. It may be shown that a set of Horn clauses always has a minimal model [Llo], a fact which is not true in general for first-order logic. The proof process in PROLOG with negation-as- failure in fact determines a minimal Herbrand model such as the one in the example above. Thereby minimality means that the domain is minimal - {ibm, d-b, cps, crs} in the example above - and that the relations have their minimal extensions - {P(ibm,cps), P(d-b,crs)} in the example. So from the semantic point of view the underlying closed-world-assumption principle may be regarded as aiming at minimal models of the given set of formulas that describes the world under consideration. We wilt come back to this point in the following sec- tion. t92

2.3. Circumscription

The way we handled the phenomenon of non-monotonic reasoning in the case of data bases and PROLOG programs seems to be completely satisfactory at least for these restricted cases. Unfortunately, the world is more complex to be modeled adequately in PROLOG. At least we have to extend our language to include the features from first-order logic that are not included in PROLOG, if not even more. Thus the question naturally arises whether the way of han- dling used so far can be generalized to arbitrary formulas in first-order logic. This turns out to be more complicated than one would normally expect.

McCarthy who worked on this problem for many years if not decades has proposed a tech- nique called circumscription. As he beautifully describes in [Mc2] this technique tries to cope with the problem of common sense reasoning of the most general sort. For instance, think of the well-known missionary-and-cannibals problem where three missionaries and three canni- bals are to cross a river with a boat that carries no more than two persons, and to do so in a way that at no time the cannibals outnumber the missionaries at any side of the river. The point is that without common sense a description of that sort could never be understood. This is because there are myriads of ways to misunderstand the story due to its lack of precision and completeness (why not use the boat as a bridge which might work for a narrow river; or why should there be a solution anyway since the raws might be broken; etc.).

Usually humans do not even think of such unlikely aspects and easily capture the essence of the problem for the same reasons that have been identified further above. Namely, we immediately associate a package of additional knowledge with such a description like "rivers normally are much broader than a boat" or our "birds normally fly" further above. However this extension is performed in a minimal way, i.e. no objects or properties are assumed that are not normally associated with a scenario as the one under consideration. Circumscription offers a technique to simulate such a behavior in a mechanical way. One element in this technique is the use of sort of a completion axiom like the one in section 2.1. We begin by formally defining this circumscription formula.

This definition requires the use of second-order logic which we have not mentioned so far. The reader should think of first-order logic as before except that function and predicate sym- bols are no more considered as constants, but may be regarded as variables and thus may also be quantified in the same way as the usual object variables in first-order logic. Let A(P,Z) be such a formula of second order logic in which P occurs as such a predicate variable but is not quantified. In fact here and in the following we always allow any variable to represent a sequence of variables, i.e. P1,...,P~ in the present case; but for sake of readability we never write down the sequences explicitly. Further let E(P,x) be a formula in which the predicate variable P and an object variable x (both possibly tuples by our assumption just made) both are not quantified. Then the circumscription of E(P,x) relative to A(P,Z) is the for- mula Circum(A;E;P;Z) defined by 193

A(P,Z) ^ Vpz {A(p,z) ^ Vx[E(p,x)-.E(P,x)]-, VxIE(p,x),,E(e,x)]} .

For a better understanding of this formula let us instantiate it for the case of a simple exam- ple such as the one from section 2.1. There A(P,Z) would be the formula describing the data base, i.e. P(ibm,cps)AP(d-b,crs) , and we would have to circumscribe the predicate P in it, i.e. E(P,x) would simply be P(xl,x2) • So the circumscription in this case would be

P(ibm,cps)AP(d-b,crs) A Vp {p(ibm,cps)Ap(d-b,crs) A Vxlx2[p(xl,x~)-*P(xl,x2)] -* Vx~,x2[p(x~,xa)*-*P(x~,x2)]}

Since p is all-quantified, we may think of any predicate. For instance, consider

p(xl,xa) -= xl=ibmAx2=cps v xl=d-bax2=crs .

The premise Vx~,×~[p(x~,x~)*P(x~,x~)] in the circumscription formula is obviously true given the assumption A(P) in this case. Therefore, according to the circumscription formula, it is also required that

Vx~,xdp(x~,×~)~P(x~,x~)] holds as well which spelled out is the formula

Vxz,xa[P(x~,x2) ~ x~=ibmAx~=cps v x~=d-bAx~=crs] , i.e. the completion axiom from section 2.1. In other words, we have shown that for the sim- ple case of our data base example the completion axiom is a logical consequence of the cir- cumscription formula, a result that holds in general for data base as well as for Horn clause problems [Re2,She,Mc3].

This might have given us a feel for the circumscription formula, at least for this special case. It is meant to replace a given set of axioms A(P,Z) by a modified set that minimizes the extension of P when Z is allowed to vary in this process of minimization. It applies to world descriptions A(P,Z) of arbitrary form, in fact even one in second-order logic, which is to say that circumscription is far more general than predicate completion and negation-as- failure as discussed in the previous two sections. Perhaps it is even too general for most prac- tical applications for which reason we now present it in a slightly more restricted form of predicate circumscription where E(P,x) is P(x). 194

For this purpose we also abbreviate any formula of the form Vx(Px--,Qx) by P~Q ; in the case of tuples (aIways keep in mind our assumption), P_~Q abbreviates PI_~QI^...APn_gQ~ , and P

Currently there is no working system of knowledge representation based on circumscription. The major difficulty in designing such a system lies in the fact that the circumscription for- mula involves a second-order quantifier. Fortunately it is possible in many cases to reduce the circumscription formula to one in first-order logic as is shown in [Lil]. At least for these cases such a system may now easily be realized on the basis of any existing theorem prover .

At the end of the previous section we discussed the model theoretic meaning of negation-as- failure and we will now provide the same for circumscription. For that purpose let us assume that A(P,Z) is given. Then for any two models M1,M2 of A(P,Z) we write Mz -~P;z M2 if

(i) the universes of, both models are the same,

(ii) for every (object, function or predicate) constant not in P, Z both models also coincide,

(iii) for every predicate in P its extension in ml is contained in that of M2 .

Then the following result holds [Mc2,MiP,Lil].

Theorem. M is a model of Circum(A;P;Z) iff M is minimal in the class of models of A with respect to _Kp;z .

The relation <~;z is, generally, not a partial ordering; therefore M as in the theorem must not necessarily exist. In fact, there are consistent formulas A such that their circumscription is even inconsistent [EMR]. For important classes of formulas it has been shown, however, that consistency is always preserved [Li2,MiP,EMR]. These complications indicate why we regarded this topic as a difficult one at the beginning of the present section.

Before we turn our attention now to the application of circumscription to non-monotonic rea- soning, we finally mention as an aside that circumscription has a close relationship with the concept of implicit definability, that has been explored for many years in Mathematical Logic [Do2]. 195

2.4. Inferential minimi:,-~tion with circun~ption

As we said in the previous section, circumscription provides a tool for treating examples like the one with flying birds further above. We will now illustrate how this works in detail [Mc3]. For that purpose let us use the following predicates.

Bx for xisabird Ox for x is an ostrich Fx for x can fly ABx for x is abnormal

Instead of stating that all birds can fly, as we did at the beginning of the present chapter, we rather express that birds normally can fly to account for the kind of scenarios described at the beginning of the previous section. So we consider the following formula A(AB;F) :

V:,,(Bx,,.ABx.-.Fx) ,, Vx(Ox-.Bx) ,,, Vx(O,,-..l~x)

Intuitively we would like to have the ostrichs as the only birds which are abnormal w.r.t, fly- ing in the world captured by A , that is Vx(ABx,-,Ox) . Indeed the circumscription formula Circum(A;AB;F) essentially amounts to this equivalence and thus produces the desired result.

From this example we see that facts of the sort "normally such and such is the case" are represented as a first-order logic statement which in the premise includes a literal ~ABx that accounts for possible abnormal cases. Minimizing this predicate by circumscribing it yields the kind of reasoning observed for humans which might be called inferential minimization, tn general there may be various ways (or aspects) of being abnormal. We may treat this by using a different predicate AB for each aspect, or we may provide the distinction in a func- tional way with a single predicate AB , such as AB(aspectl(x)) , AB(aspect2(x)) and so forth. In the following scenario, that includes airplanes (P) and dead things (D), we take the first alternative. Vx(Ox-,Bx) Vx~(Bx^Px) Yx(~ABlx -' ~Fx) Yx(Px ^ ~AB2x -, Fx) Vx(Bx A ~AB~x "~ Fx) Yx(Ox A ~AB4x "* ~Fx) Vx(Bx A Dx n ~ABox--, ~Fx) 196

The circumscription Circum(A;AB1,...,ABs;F) does not lead to the intuitively expected con- clusions: there are no abnormal airplanes, ostrichs, and dead birds; ostriehs and dead birds are abnormal birds; airplanes and the birds that are alive and not ostrichs are the only objects satisfying ABz • The reason is that the goals of minimizing our five abnormality predicates conflict with each other. For instance, minimizing the extensions AB~ and ABz conflicts with ttie goal of minimizing AB 1 .

Prioritized circumscription [Mc3] overcomes this problem. There one establishes priorities between different kinds of abnormality, specifically one assigns higher priorities to the abnor- mality predicates representing exceptions to "more specific" common sense facts, e.g. AB 4 , AB~ > AB 2 , AB~ > ABz in the present example. With the circumscription formula adapted to this generalization indeed provides a satisfactory solution that leads to the expected solu- tions. Also for this case a first-order treatment may be achieved in many important cases [Lil].

In summary we have seen that circumscription offers a rather general solution to the common sense reasoning. The approach still seems a bit too complicated in comparison with the sup- posed human reasoning. Also there are still open problems of detail. Therefore it seems worthwhile to have a look at other approaches taken to cope with this phenomenon as we do in the next section.

2.5. Other approaches to inferential mlnirni~,.ation

While circumscription generalizes the way taken with the completion axioms from section 2.1, all of the variants discussed briefly in the present section might be regarded as a generalization of the way taken with negation-as-failure from section 2.2. Namely, it has been pointed out in 2.2 that negation-as-failure is in fact a meta-levet principle, and the same holds for all of the following approaches. 2.5.1 Explicit listing of exceptions. The simplest way to deal with rules of the sort "birds fly" is by including all exceptions explicitly in the form

BIRD x ^ ~OSTRICH x ^ ~PENGUIN x A .... FLY x

For a large number of exceptions this clearly is an awkward approach although it reduces the problem to classical reasoning without any extra provision. To some extent the behavior can be simulated by taking advantage of the fixed control of a PROLOG system and listing the clauses appropriately, or by a set-of-support mechanism in a general resolution prover.

2.5.2 Default reasoning. One natural way to deal with problems of the flying-birds sort is to interpret a rule like "birds fly" more precisely as "if nothing is known to the contrary we may assume that birds fly". The question thus is how we could formalize the "if nothing is known to the contrary" in this phrase. 197

Reiter in IRe1] has proposed to adopt the interpretation "it is consistent to assume that ..." for it which formally may be represented as a default rule in the following way.

BIRDx : M FLYx FLY x

The general form of such a default rule is

A: MBI,...,MB~ C

Of course, it has to be made precise in a formalism what exactly this means and how such a rule can be applied within a theory, in particular how the consistency can be determined. Reiter has carried out this program in all details in form of a default theory. In general, default theories have deficiencies that prevent their use in the intended way. These deficien- cies do not occur if one restricts the theories to normal or semi-normal default rules. A default rule is called normal if m=l and Bz-C in the rule above. It is called semi-normal if it is of the form A : MB1 , M~B/B1 , where B is an atom.

Even in normal default theories there is a serious computational problem since each Dale appli- cation requires a deductive test for the derivability of the defaults which incidentally demon- strates the meta-level aspect of this approach which will further be pursued in section 3.2. In [Gro] it has been shown that this kind of default reasoning can be reduced to circumscription for which reason we do not discuss it here any further.

2.5.3 Truth maintenance. As we learned at the beginning of the whole section 2, the addition of facts to a non-monotonic reasoning system may change the conclusions which can be derived. In practice one would prefer to store a condusion explicitly after its derivation in order to keep it available for later purposes. This however raises the problem of truth- maintenance since after new updates of the knowledge base earlier derived conclusions might not be true any longer. One of the first systems that deals with this particular problem is described in [Do1].

2.5.4 Modal lo#c. The M in the default rules may well be interpreted as the modal opera- tor "is possible" from modal logic. This is no surprise since modal logic was invented to deal with exactly this kind of recta-level reasoning about what is derivable on the object-level or not. In a modal logic approach of such a kind of problem the main issue always is to find the right axioms capturing the exact meaning of the modal operators. In our case this attempt, which follows the first line of approach mentioned at the beginning of this section 2, has run into a number of problems. The latest state of the discussion of this particular approach may be found in [Moo]. 198

2.5.5 HiKher-order predicates. Higher-order logic provides another way of integrating recta- level expressions into object-level ones. An approach to default reasoning that takes advantage of this flexibility but within first-order logic is discussed in section 3.3.

2.5.6 Meta-reasoning. Instead of integrating the meta-level features into the object language as in the previous two approaches, one may separate them explicitly from the object language and provide a mechanism that links the two levels together. This approach has been taken in [BoK]. We will briefly demonstrate it when we discuss recta-level reasoning in section 3.

2.5.7 Tolerance of inconsistencies. If we have a rule with exceptions then an inconsistency would arise if no extra care would be taken. For instance, if we have

BIRD x --, FLY x PENGUIN x --, BIRD x ^ ~FLY x then penguins would both fly and not fly. It seems that humans use exactly this kind of representation to deal with this sort of problem; especially no one thinks of any exceptions when asked about the characteristics of birds and then lists among other things that birds fly. Yet such inconsistencies apparently do not confuse our logical reasoning.

In a formalism the inconsistency could be tolerated by restricting the inference mechanism so that e.g. the two rules above never interfere with each other. If we think in terms of the con- nection method as the underlying inference mechanism, then this would simply mean that cer- tain connections may never be used in any deduction. Which of the connections are taken out in this way may be determined in advance for a given knowledge base. We just mention that this amounts to determining tautology loops in the knowledge base like the one from the literal BIRDx in the first clause to the same in the second clause, and from ~FLYx there back to FLYx. In this example these two connections together are useless for any reasonable deduc- tion and thus should never be taken into account.

The advantage of this simple proposal, which seems to have been ignored so far, is the result- ing efficiency. Namely, it would even be more efficient to locate an appropriate deduction than in a usual first-order problem since some connections may simply be ignored thus reduc- ing the search space. For this reason we feel that this approach might be the most attractive one at all. But no one has explored this in any detail upto now.

In summary, we have seen that there are several viable solutions at hand that may be used for realizing non-monotonic inference, a form of common-sense reasoning. It is now time to try the most promising ones in experiments in order to find out their relative merits in practice. 199

3. META-REASONING

The methods discussed in the previous two sections provided a way of drawing inferences among pieces of knowledge represented in some formal language, mostly first-order logic. These syntactic constructs were meant to model some real world scenarios. Apparently these constructs themselves might be regarded as part of some real world scenario. In fact, often the need arises to reason not only about the real world knowledge of the first sort but also about these formulas.

For instance, in the default reasoning approach we met such a situation where the reasoning process involved the question whether some formula was derivable which amounts to reasoning about certain relations among these syntactic constructs. There are many other applications than the one just mentioned. For instance, one might wish to provide the user of a reasoning system access to its control. This necessarily requires a language allowing for meta-level expressions. Another application arises in situations where we reason about the knowledge or beliefs of other agents. The present section deals with exactly these kinds of phenomena.

3.1. Language and met,a-language

The first-order language used so far in this paper for emphasis might be called an object-level language in the present context. What then is a meta-language? It talks about the syntactic entities of the object-level language just as this in turn talks about some other entities. A literal such as P(ibm,cps) is an example of such a syntactic entity. How could it be named in the meta-language?

It is our intention to let the recta-language be a first-order language as well. Objects in such a languages are denoted by constants. Hence we would have to denote such a literal by some constant, say c , in the meta-language. On the object-level we sometimes preferred to use mnemotechnic notations such as ibm rather than c . The same will be even more useful in the present context. The most natural way to name a phrase is by quoting it. Hence we include constants of the sort "P(ibm,cps)" in the alphabet of our recta-language keeping in mind that this is just for better reading otherwise being a constant just as c . This way we may name any first-order formula of the object-language.

Once we are in a position to name formulas we may then represent relations among them by predicates. For instance, a predicate, say INFER, in the recta-language may denote the rela- tion defined by l- from section 1. For instance, we may consider the following literal.

INFER("Ms^Vx(Mx-,MTx)","MTs")

It relates two constants that name the formulas considered in section 1 where we used different names for them, viz. KI^K2 and K0, So naming and talking about formulas has been done before in this paper except that this was done in a more informal way while now we are 200

aiming at a formalism for the same purpose. In fact, a definition like the one for the system GS in section 1 is already very close to such a formalized language. May we therefore suggest as an instructive exercise that the reader rewrites this definition in a purely formalized way in first-order logic, or in PROLOG. If done correctly with DERIVE denoting the derivability relation then along with the rule

INFER(x,y) if DERIVE(x."-*".y) we might successfully run the resulting PROLOG program with the literal

INFER("Ms^Vx(Mx-*MTx)",'MTs") as a goal clause. In other words what we do this way is just writing an interpreter for GS in

PROLOG -- one application of the recta-language.

In this exercise a question arises that has been made explicit in the rule just given. Namely, we face the need to construct constants with variable components such as x."-*".y where x and y ranges over arbitrary formulas. For this purpose we used the infix notation for con- catenation; alternatively we might have written conc(x,conc("~",conc(y, nil))) in LISP nota- tion which for first-order logic is simply a term with two vm'iabtes, and thus causes no prob- lems at all.

Let us summarize this discussion and do so in restriction to PROLOG in order to focus the attention on an executable system. There is the usual language in which we write PROLOG programs, the object language. The PROLOG system interprets such programs thus establish- ing a relation between the program, say progr , and the goal , formally

progr I- goal .

There is a second language, the meta-language, which allows to name formulas (programs and goals) and relate them by predicates, e.g. we may say INFER("prog","goal") where "prog" and "goal" are the names for prog and goal on the meta-!evet. Even more we may write an interpreter for a formal system like GS in this meta-language in form of a PROLOG pro- gram, say interpr , and run Prolog to test the relation

interpr t- INFER("pmg","goal")

If all this is done correctly then we clearly would expect that the one relation holds iff the same is true for the other, which in fact provides the important link between the two levels. Formally this link may be established by adding the transitions from one to the other as 201 explicit rule.s to the system that this way amalgamates the two languages/systems into a single one. These two rules are usually called reflection principles and have been used in a number of systems that were designed along t~hese lines [Wey,BoK,BoW,Gen].

3.2. Application to default reasoning tn section 2.5.2 the meta-level aspect of default reasoning was already pointed out and may now be formalized in the following way [BoK].

FLY(x) if BIRD(x), ~INFER('pmg",'~FLY(".x,")")

In other words, birds for which the knowledge base represented by the current object-level pro- gram denoted by "prog" does not specify anything to the contrary are assumed to fly. Although this is an elegant way of representation experience with its implementation [BOW] has to show whether it is a feasible approach that may compete with the circumscription tech- niques under development.

3.3. Self-reference

We learned in the present section 3 that on the meta-level we may name syntactic items from the object-level. So if Pc is a literal on the object-level we clearly may name P by "P" and introduce a predicate HAS on the meta-level, which we define by

HAS(c,"P") ,-, Pc informally, c has property named "P" iff Pc holds, a form of what is called comprehen- sion axiom. That seems to be absolutely natural, and in fact G. Frege has taken this view about a hundred years ago in a slightly different notation. Unfortunately, B. Russel showed early in this century that a formal system that allows these kinds of definitions is inherently inconsistent. Essentially, he defined R(x) ,-, ~HAS(x,x) and applied it to x="R" , i.e. used an example that involved self-reference.

This problem can be eliminated simply by avoiding self-reference altogether. But this amounts to a serious restriction since this feature is one that we use quite often in our natural way of reasoning; just think of the sentence "what I am just saying is not correct". The solution pro- posed by Russel consisted in establishing a hierarchy of language levels in what today we call higher-order logic, which however leads to computational and still representational problems [Per]. In section 3.1 we avoided this problem by allowing a less powerful comprehension axiom, viz. the reflection principle.

Recently, it has been shown [Fef, Per] that Frege's approach is possible in a consistent way if one takes into account a slight restriction on the comprehension axioms that can be easily 202 tolerated for practice. It seems that this result which we do not develop in detail here might have far-reaching consequences among which are the following. Object- and meta-level may be amalgamated in an even stronger form than shown in 3.1. The computationally relevant parts of modal and higher-order logic might be treated in a purely first-order way which in turn would mean that first-order logic would finally turn out to become the formalism par excellence.

Encouraged by this result let us reconsider our flying-birds problem from section 2. Remember that at the end of section 2.4 we already indicated doubts whether the circumscription (and other) approaches to non-monotonic reasoning are as efficient as the human one. In particular we feel that a rule like "birds fly" itself is not affected by encountering say a penguin that does not fly. Here an additional rule is added that we think is not "penguins are birds" but rather "penguins are non-flying birds". Let us illustrate this idea with the following example.

Px represents "x is a penguin"; Bx represents "x is a bird"; Ax represents "x is an animal"; Cx represents "x is a cardinal"; Fx represents "x can fly"; t denotes tweety and r denotes robby. Then "birds fly" may be represented by ~F("B("P")") , "birds are animals" by A("B") , "cardinals are birds" by BCC"), "robby is a cardinal" by C(r) , and "tweety is a penguin" by P(t) . Altogether this scenario is given by the following for- mula. A("B") A F("B") A B("C") A ~F("B("P")") A C(r) ^ P(t)

Properties that apply to a class also apply to each of its members, expressed by

zcx") ^ X(x) ~ Z(x) otherwise it must be made explicit (e.g. in the functional way whole("B") ) that the class as a whole is addressed. This property inheritance is allowed in classes with additional restrictions only if the property is not identical with the complement of the restriction, expressed by the formula Z("X") A Y("X(".x.")") ^ "Y"~"~"."Z" --* Z(x)

From these three formulas we easily may infer that robby can fly but that tweety cannot, further that both are animals. For instance, the second rule applied with the substitution {ZkA, X\B, Y\~F, xV'P"} yields A('P") which by application of the first rule results in A(t) . This looks like a very unusual way of representing this kind of knowledge in first-order logic. But we remind the reader that logic provides the form only and is totally open w.r.t, how this form is used to represent concepts. If this representation has no other disadvantages (which we might have overlooked at this point) why not preferring it to more familiar ones. In natural 203

language this kind of representation seems to be quite familiar anyway.

3.4. Re.a~ning about knowledge and belief

A sentence like "Dean doesn't know whether Nixon knows that Dean knows that Nixon knows about the Watergate break-in" demonstrates that reasoning about knowledge and belief is not only familiar in everyday circumstances, but may also be quite complicated. If we, to begin with, restrict it to the simple form "a knows that B" then we may see that knowing is a meta-level concept. So for its representation we may use the two approaches discussed before in the present section 3. That is we may treat KNOW as a predicate on the meta-level or may allow iterated application of predicates as in KNOW(a,"GREEN(grass)") .

From a philosophical point of view all knowing is relative, that is pieces of knowledge are actually beliefs that are true relative to some higher beliefs. We therefore prefer to talk about belief rather than knowledge and adopt the rule

KNOW(a,F) ~ BEL(a,F) ^ TRUE(F)

but not its inverse. So as another example "Hans believes that Richard-von-Weizs/icker is president of Germany" would read BEL(hans,"PRES(r-v-w)") . Or "There is someone whom Hans believes to be president" would either read 3x BEL(hans,"PRES(".x.")") or 3xy [NAMES(hans,x,y) ^ BEL(hans,"PRES(".x.")")] . This illustrates how one may represent arbitrarily complex statements about someone's beliefs with the first-order language envisaged in the previous section 3.3. Obviously, one may then also use the reasoning mechanism avail- able in first-order logic for drawing correct inferences.

Like with any other predicates we have to specify rules that capture our intention with the predicate BEL . For instance, one might think of the rule [Moo]

BEL(a,"A-*B") A BEL(a,"A") -* BEL(a,"B")

This kind of rule may be questioned since a might never think of actually performing the inference. A similar case is known as the problem of logical omniscience occurring in the pres- ence of the rule KNOW(a,A) ^ (A-,B) -* KNOW(a,B)

which even more cannot be accepted without some restriction. A possible solution is discussed in [Levi.

The problems occurring in reasoning about knowledge and belief have extensively been dis- cussed in [Hin]. There a semantics is taken into account that envisages possible worlds. In particular, a knows F iff F is true in all the worlds a thinks are possible. This kind of 204 semantics has been formalized with Kripke structures [Kri] but it is not clear how to model the state of knowledge with them. There are serious doubts w.r.t, computational feasibility of these possible worlds approaches.

In [Mcl] a functional approach to modelling knowledge and belief has been drafted. There concepts are treated as special functions that are denoted by strings starting with a capital letter. For instance, Know, Pat, Phonenumber, Mike, all denote such concepts. "Mike knows Pat's phonenumber" would read

TRUE Know(Mike,Phonenumber(Pat))

Similarly, "Joe knows whether Mike knows ..." would read

TRUE Know[Joe, Know(Mike, Phonenumber(Pat)) ] while "Joe knows that ..." would be distinguished by reading

TRUE K[Joe, Know(Mike,Phonenumber(Pat))] with a different knowing concept denoted by K . For each concept X there is an object x of which X is the concept, formally x = denote(X) . Although this approach seems to work for the examples given in [Mcl], we share the opinion expressed in [Per] that the hope of a satisfactory functional treatment of concepts and modalities, that is associated with this propo- sal, is unfounded, not mentioning the unintuitiveness of this model. Finally we mention the approach described in [Bi6]. It takes the view that different believing agents are making their inferences in completely separate worlds, which formally means a representation in alphabets with empty intersections. In this sense the phonenumber of Pat in Mike's world of beliefs would be represented as

phonenumber~ike(patmke) indicating that both the function symbol and the constant are taken from Mike's world and are to be regarded different from those say of Mary's world similarly indexed by mary. This basic idea has been generalized in order to correctly reason in a purely first-order setting about what different agents know. That such reasoning is not quite trivial may be seen from such examples like "Mike knows Pat's phonenumber which incidentally is the same as Mary's; does Mike therefore know also Mary's number?" He does not know it, that is, unlike usual first-order reasoning, reasoning on knowledge is opaque and not transparent in the sense that we not simply substitute equals for equals in such sentences unless Mike knows of the equality 205

(and in fact uses it).

3.5. Expressing control on the recta-level

Usually, a theorem prover for first-order logic or a subset thereof is built with a fixed more or less complex control in it. In particular in the context of using logic as a programming language the need naturally arises to adapt the control to the special problem under considera- tion either automatically or by interaction with the programmer. From what we have learned in the present section 3 it should be obvious that this is typically a recta-level task. It requires a control language on the recta-level talking about what goals to be processed next and unified with what heads of etauses, if we think in terms of PROLOG. It is straightforward to realize this idea in a practical control language.

In [Bil] it has first been discussed how such a control added to a logic program if appropriate would after compilation result in a program as efficient as any corresponding say ALGOL program. A language in use that realizes this kind of approach is IC-PROLOG [C1M]. For other proposals in the same direction see [GaL,Gal].

4. REASONING ABOUT UNCERTAINTY"

Recall from section I that the basic issue of this paper is the inferential relation K I% K'. In section 2 we have already considered examples where the knowledge K is not quite certain in some sense; the knowledge that birds fly is of such a kind. In many cases of that sort wc have some feeling about how certain we actually are. In scientific disciplines such as economics, medicine, geology, etc. this feeling often has even a very solid base provided by extensive sta- tistical material. If such additional information is available then obviously it should be taken into consideration in our way of reasoning perhaps in a more explicit way than the one dis- cussed in section 2 where among the various uses of non-monotonic reasoning we already mentioned this aspect under point 5.

In the present section we are going to discuss some of the possibilities that have been explored for that purpose. We do so in a section separate from section 2 not so much because the qual- ity of the problem is really different, rather because here there is an emphasis on this extra aspect of taking into account the uncertainty in a more quantitative way. Also note the rela- tion with the previous section since this extra aspect clearly is some sort of knowledge on the meta-level although the approaches discussed below mostly do not take this into explicit account.

The phenomena associated with this kind of reasoning about uncertainty have been studied under many different points of view which gave rise to a variety of different names for more or less the same topic. Some of these names are fuzzy, approximate, plausible, or vague rea- soning, reasoning with, under, or about uncertainty, theory of evidence, or of possibility, to some extent also inconsistency reasoning. Among all these approaches we may distinguish 206

those based on some sort of probability theory ("normative approaches") from those taking a non-probabilistic knowledge-based point of view which aim at modeling human performance ("performance or positive approaches").

4.1. Baye~an inference

Often humans associate some kind of measure with statements. For instance, an expert invest- ment counselor might associate a degree 0.5 with the rule that advanced age implies low risk tolerance. This might mean that there are statistic informations showing that 60% of the eld- erly people have low risk tolerance. It may also mean that the expert summarizes his knowledge about the relation of age and risk tolerance with this figure which might then be regarded as a degree of belief in the statement. Whatever is the case let us assume that somehow we are provided with probabilities of this kind along with any kind of statements.

Let E denote any such statement, e.g. "John is of advanced age". So we would consider a probability value P(E) along with E ; similarly with any other statement H such as "John has low risk tolerance". The rule mentioned just before states that E implies H in this example, where E may be regarded as evidence for the hypothesis H to hold. As with any statement, we consider some probability P(E-*H) for this rule as well; in probability theory one usually writes P(HIE ) to express exactly the same and calls it the conditional probability of H relative to E. Finally, we may consider the probability that H and E both hold, i.e. P(H^E) , sometimes written shortly P(HE) .

A simple probabilistic argument shows us that P(H^E) is the probability P(E) times P(H[E) , called the theorem of compound probabilities.

P(H^E) = P(E)'P(H[E)

For instance if John may be considered to be of advanced age only to a degree of 50% then P(HAE) would be 0.5"0.6=0.3 in our example. Of course, we may as well consider the inverse rule "low risk tolerance implies advanced age", that is H~E , and its probability P(E[H) . As before we obtain

P(E^H) = P(H)'P(E[H)

Since ^ is commutative, the left sides of these two equations must be equal, hence also their right sides, that is P(E)'P(H!E) = P(H)'P(E[H) , or

P(HIE ) = P(H)'P(E]H)/P(E)

This equation is called Bayes' theorem. See any book on probability theory (such as [deF]) for 207 more details. It provides the basis for reasoning about uncertainty in many expert systems such as MYCIN, PROSPECTOR, etc. For this kind of application we have to consider a number of hypotheses H1,...,H, , each of which being conditional on E = EI^...^E ~ . In practice the hypotheses are selected such that they may be assumed to be mutually exclusive and exhaustive, i.e. in any scenario exactly one of them is assumed to hold. Further, condi- tional independence is assumed which means that the pieces of evidence support each hypothesis in an independent way, expressed formally as P(E1A...AE, t Hi) = P(Et[H~)'...'P(E,[Hi). Under these assumptions one may derive a form of this theorem that allows to update the conditional probabilities whenever information becomes available (e.g. by experiments) that some of the evidences in fact hold; this means in other words that we may carry out the kind of reasoning discussed in section 1 for the simple case of rules (or Horn clauses), but at the same time calculate the probabilities for the derived statements. For more details see [DHN].

A useful way of viewing this formalism is an inference net in which the propositions describ- ing pieces of evidence or hypotheses are represented as nodes and the relations among proposi- tions become the links of the network. The probabilities are measures associated with the nodes. The updating of such a measure for one or more nodes upon arrival of new evidence causes a propagation of the change along the links until the net stabilizes again.

Disregarding the fact that this approach is quite popular and successful, it has been shown that these assumptions are quite problematic, in fact may even lead to inconsistencies [Gly]. One way of avoiding these problems is described in [Kad]. But there are other problems with the Bayesian approach as discussed so far. They include the difficulty to distinguish uncer- tainty (about what we know) from ignorance (see next section for an example) as well as the fundamental problem how meaningful such probabilities (the "certainty factors") are in appli- cations and where to take them with some reliability in the first place. Finally, this approach also is restricted to situations where the propositions can be arranged in a hierarchy with infer- ence chains flowing smoothly from rough evidence through to conclusions which often is not the case in practical applications [Qui].

4.2. The I)embmer-Shafer theory of evidence

The Dembster-Shafer theory of evidence [Sha] is a close relative of the Bayesian approach. Both take into account degrees for measuring certainty. As a main difference we note that in the Dembster-Shafer approach the probability distribution is assumed over all subsets of hypotheses rather than over all individual hypotheses as in the Bayesian approach.

Suppose we are considering a world with four automobile makers, Nissan (N), Toyota (T), General Motors (G), and Chrysler (C), and want to determine the probability of who might dominate a new market [GoS]. Instead of considering a predicate DOMIN in order to express e.g. the dominance of Nissan and Toyota by DOMIN({N,T}) we briefly write {N,T} for 208 the same purpose. The set D = {N,T,G,C} is called the frame of discernment in this approach. As in the Bayes approach it is assumed that the singleton hypotheses are mutually exclusive and exhaustive. Now assume evidence is somehow obtained that the probability of Japanese dominance is .4. In order to update the probabilities so that this new information is incorporated the following mechanism is applied.

A basic probability assignment function m is introduced that allows to assign probabilities to subsets of D . Initially, m(D)=1.0 since no other information was available. After obtaining the evidence m({N,T})=.4 the decrease in our ignorance is captured by updating m(D) so that its value continues to express exactly the degree of ignorance which is 1.0 - .4 = .6 in this case. For all other subsets of D the value of m is 0 in the present situation. The probability P expressing our degree of belief as in the Bayesian approach may be calculated from the values of m by adding the m-values of all subsets. For instance, P({N,T})=.4 and P(D)=I.0 while this value is 0 for any other subset in this case.

The question remains how one performs the updating as with m(D) just before in a more complicated case. This is achieved with Dempster's rule of combination as follows. Suppose a second evidence is obtained for the present scenario suggesting a dominance by {T,G,C} with a probability of .8, i.e. m2({T,G,C})=.8 which leaves an ignorance m2(D)=.2 . For distinc- tion let us denote the m-function with the previous values by m 1 . What are the new m- values on the basis of this additional evidence? The rule is to multiply the previous m-values (ml) for a subset $1 with the m-value (m~) obtained from the new evidence in order to obtain the combined m-value for the intersection $1 with S~ . In the present example this rule gives us the following values.

m({T})=m~({N,TI)'m2({T,G,C})=. 4 +.8=.32 m({N,TI)=ma({N,T})'me(D)=.4".2=.08 m({T,G,C})=ml(D)"m~({T,G,C})=.6".8=.48 m(D)=ml(D)'m+(D)=.6'.2=. 12

For the remaining subsets the m-values continue to be zero. As before the degrees of belief are calculated by adding the m-values for all subsets. So we obtain for example

P({N,T}) = m({N,T}) + m({T}) + m({N}) = .08 + .32 + 0.0 = .4

Recently is has been shown in [Pea] that an appropriate view of the Bayesian theory in fact yields the same kind of flexibility that has been just demonstrated with the Dempster-Shafer theory. So it seems that both approaches are pretty much the same indeed. This includes the fact that both share the same disadvantages indicated at the end of the previous section. 209

I..3. Fuzzy logic

Fuzzy logic has emerged form an attempt to develop a logic that models the fuzziness of natural language. This fuzziness is present in many features of natural language, for instance in predicates such as "young", "intelligent", "blonde", or "elderly", "having low risk toler- ance" mentioned already in section 4.1 above, but also in quantifiers such as "most", "some", "not very many", etc. While we have seen a way to cope with the fuzziness of predicates in the previous two approaches, there is nothing in them which suggests a way of dealing with such fuzzy quantifiers. Here fuzzy logic offers a single conceptual framework for dealing with those different types of uncertainty. In a sense it subsumes both predicate logic and probability theory. An attempt to access the huge amount of papers on this topic might Start with [Zal,Za2].

Let us first consider the case of a fuzzy predicate, e.g. YOUNG(john). Fuzzy logic requires such a statement to be transformed in a canonical form which makes explicit the range of fuz- ziness. Here this range is the age of John within the interval say [0,100]. The canonical form is YOUNG(john)--,age(john)=YOUNG which is associated with a membership function fy0u~0(u)=l-S(u;20,30,40) of a fuzzy set of the range [0,1001. S is a fixed continuous function that is 0 upto 20, then grows to .5 at 30, and saturates at 40 reaching the value t that is kept until 100. For instance, f~ou~0(28) is approximately .7. By itself it is meant to express that .7 is the degree of compatibility of 28 with the concept labeled YOUNG. The statement YOUNG(john) converts this meaning to that of expressing the degree of the possi- bility that John is 28. In summary, for any such statement containing fuzzy predicates we have to specify the canonical form of the statement, provide the ranges, and specify the parameters for the S-function which determines the possibility function f .

Such possibility functions are then associated with fuzzy quantifiers such as "most" or "more than half". Further it is defined how two such functions are combined to yield one that characterizes a "quantifier" resulting from the application of a togieal inference rule such as the "quantifier" Q in the following inference.

most students are single more-than-hMf-of the students are male Q students are single and male

Clearly, fuzzy logic indeed provides a coherent formalism for dealing with uncertainty. But the problems mentioned for the previous probabilistic approaches seem to be even more serious here where the probability technique covers even more features. In this context we might ques- tion the membership functions, the rules of combinations as being fixed in a pretty much arbi- trary way that does not adequately model the human way of coping with uncertainty. 210

4.4. Pcrf~ approaches

At the end of section 4.1 we have mentioned some of the difficulties that are encountered in the probabilistic approaches discussed so far. tt is not surprising, then, that a number of attempts have been made to cope with the phenomenon of uncertainty in a non-probabilistic way. These may in fact be regarded as more typical AI approaches since they more closely try to model what seems to be the human way of coping with uncertainty which in essence clearly is not a probabilistic one.

If we ignore quantifiers to begin with, then the situation encountered in the previous sections consists of a number of propositions such as facts, rules or more complex statements. They are supposed to model some reality. For each particular application each of the proposition is either true or false, i.e. there is no such thing as being true with degree .6 . We only do not know for certain which of the two alternatives in fact applies.

However often these propositions are not the only information available. In addition there may be some meta-knowledge about the propositions themselves such as experience that rule 1 is more reliable than rule 2 possibly based on statistical information derived from earlier applica- tions. More important, even large knowledge-based systems in use today comprise only a very small fraction of the knowledge that is usually available for a human expert carrying out the tasks posed for such a system. For speculation let us assume that a system can be built comprising all this knowledge represented in form of propositional statements. Since even then we still would not know for many of these statements whether they are true or not, the only remaining way of deriving some conclusion would be to isolate consistent subsets of knowledge, to draw conclusions from each of them, to compare the results and arbitrarily decide which ones to prefer. One might consider many of the performance approaches as approximations to this general model of cal- culation. Especially the system Ponderosa [Qui] is based on this paradigm. It generates from the given set of statements maximally consistent sets of statements and separates them from the remain- ing statements. Although it does take into account measures of belief, there is no automatic selection of the "best" maximally consistent set; rather it provides those measures to the user as a filter and heuristic guide. An obvious objection to this approach would be the exponential growth of possible subsets to be considered in the search for the consistent ones. This problem is overcome in Ponderosa again by a regress to the measures involved that are used here to restrict the search to result in the (currently) 10 "best" sets without the need to generate the remaining ones. Note that Ponderosa realizes one way of reasoning that tolerates inconsisten- cies, a topic that we already mentioned in section 2.5.6.

While Ponderosa still involves measures as in the probabilistic approaches though with a drasti- cally restricted function, the system SOLOMON [Coh] is based on the model of endorsement 211 and uses no such measures anymore. As in a bureaucracy potential conclusions have to pass a number of tests by meta-rnles that qualify it as positive ("pro"), negative ("con"), or irrelevant. The rules encode judgement, qualify the source and preciseness of the information, and other meta-knowledge of similar kind. When passing the structured net of rules the test results are collected in a "ledger-book'. The summing of these results is carried out in a deductive rather than a probalistic way. Only sufficiently endorsed conclusions eventually are allowed to pass the test.

None of these systems involves a technique for dealing with fuzzy quantifiers as in fuzzy logic. That does not mean that the probabilistic treatment of such quantifiers provides the only solution to their formalization. For instance, in [BiS] a first-order solution for representing fuzzy quantifiers is outlined. For instance, the fuzzy quantifier "most" in "most students" is expressed as, informally, "all elements in a subset of" the set of students which in terms of car- dinality is not very different from the whole set".

4.5. Engineering appm,~c'hes

Often the meta-knowledge is not represented explicitly in current systems dealing with uncer- tainty but rather is encoded implicitly into the control strategy of the system. As we discussed in section 3.5 control is knowledge that naturally is interpreted as meta-knowledge, hence such systems in this sense take an approach like SOLOMON except that they do not make the con- trol knowledge explicit. Pattern-recognltion systems dealing with huge amounts of noisy infor- mation are often built in such a way. As an example we mention the system HEARSAY II (see [BaF]). Often such systems use a special system architecture known as the blackboard model developed during the HEARSAY project.

5. SUMMARY AND CONCLUSIONS

In this paper we have given an introduction to essentially four types of reasoning, viz. classi- cal, non-monotonic, meta-, and uncertainty reasoning. In retrospect we might now wish to raise a number of questions about this selection. First of all one might ask why we have chosen this particular sequence of topics.

We admit that there is no absolutely convincing argument which separates this structure of presentation from others. The problem is the close interrelation among all these topics. For instance, meta-reasonlng in first-order logic clearly is classical reasoning and after amalgama- tion is even totally identical with it on the system level. Similarly, many aspects of uncertainty reasoning can be interpreted the way discussed under the topic of non-monotonic reasoning which in turn may be formalized in terms of classical logic as we have seen in this paper.

Yet the focus of interest is sufficiently different in each of these four types of reasoning to justify their separate treatment. Anyway this separation is in line with common practice except perhaps for meta-reasoning. The latter often does not enjoy the special attention that 212 we have spent on it. But we feet that its potential may have been underestimated in the past. Its treatment after non-monotonic reasoning rather than along with classical reasoning was done for didactic reasons in order to demonstrate its applicability to the phenomena described in section 2.

Next we might ask whether these four comprise all the main types of reasoning. This is cer- tainly not the case since there are many more kinds of reasoning that are sufficiently different from those to justify their presentation. One of them is inductive reasoning which however is included in the chapter by A. Biermann in this volume. Another one is reasoning by rewriting as in equality theories. These may be regarded as encoded forms of classical reasoning as we pointed out in section V.4 of [Bi3]. In the present volume they are treated in the chapter by G. Huet and to some extent also in that of M. Stickel. Some other kinds for lack of space will be mentioned only briefly in the following.

As we have seen throughout the paper, for all the types discussed so far there was always a technique within the framework of first-order logic that handled it appropriately. This is the case also for those not mentioned so far. Hence their treatment is implicit in what we have presented before, This is to say that first-order logic provides the formalism that is flexible enough to allow the conceptual expression of many more kinds of inference. Of course, the formalism does not reveal by itself how this is to be achieved in detail.

One further type of reasoning might be distinguished that occurs in problem solving and plan- ning. Obviously, a problem may be formalized within first-order logic. But it is not obvious how our natural reasoning in such cases could be modeled in this framework. Yet a number of such techniques have indeed been suggested. Among them [Bi8] proposes a direct application of a classical theorem prover (as described here in section 1) that is subject to a certain restric- tion in its control (in other words some meta-knowledge is built into it - cf. section 3.5).

A lot of reasoning is involved in programming and in reasoning about programs. PROLOG has demonstrated that classical theorem provers as described in section t indeed are extremely useful tools in this context. But there are many more kinds of application in this wide field such as for program synthesis (which is closely related to inductive reasoning), program verifi- cation, program analysis. Other logics have been proposed for those purposes such as temporal logic, a kind of modal logic (see section 2.5.3). For good reasons we prefer the classical approach also in this context but for lack of space cannot go into further details of this large topic. In both previous applications of planning and programming time played a certain role, So we might ask how time can in fact be dealt with in a purely descriptive framework such as first- order logic. Again we only can mention that convincing proposals do exist and have been used successfully in running systems - see [Sho] for a survey; as an example we mention [KoS] where time is captured by events and the periods marked through their occurrence. 213

Speaking of time might bring us next to space, viz. the physical space. Or, more generally, to qualitative physical laws and the reasoning about them [BdK], which again opens a whole new range of aspects; just think of reasoning about the behavior of liquids [Hay].

Before we interrupt this seemingly endless list we finally mention analogical reasoning [Win] which is a kind of meta-reasoning that allows for certain abstractions. As in all previous cases we see the first-order formalism as the appropriate framework for its treatment.

In summary, we admit a strong bias towards the attempt to uniformly conceptualize all these different phenomena under the common framework of first-order logic. In order to appreciate this bias it is important to be aware of the fact that the deductive relation I- as in K ]- K' is really a relation that can be explored in various ways, not only in the axiomatic one where K is assumed to be given and K' is derived or tested. We may also use it for given K' and unknown K , or partially known K and K' , and so forth. In addition there are vari- ous ways of structuring like recta-inference as discussed in this paper. Because the variety of kinds of reasoning is so confusingly rich it appears that any other approach could simply not be carried out because of lack of uniformity and simplicity available here.

With this latter remark we actually carry the discussion to the point of realizing all these vari- ants in a hopefully single uniform system. There should be no doubt that we are far away from such an artifact. In fact the task seems to be so complex that it is hard to imagine how it might be put together by human minds. We believe that this is possible only if the system is of such a kind that it allows to assemble the pieces of knowledge in arbitrary order, one after the other. Seven pieces of factual and rule knowledge, one meta-level piece talking about them, then a piece of control knowledge, followed by another 42 pieces of domain knowledge, a rule of judgement, and so forth, just to illustrate what we mean.

This requirement singles out a form of knowledge representation that is extremely modular on the one hand, but also reflects the tightly woven net of relationships among aI1 these pieces of knowledge. First-order logic clearly enjoys the modularity needed, but does it also support the connections? On the surface of the representation it does not indeed. But once the representa- tion is transformed into an internal form, for instance as a dag (directed acyclic graph), then these connections become visible, at least to the system as we showed in section 1.6 of [Bi7]. Since this part may be implemented completely independent from the particular knowledge to be represented we see that the first-order formalism does indeed support both requirements in an ideal way.

With respect to the architecture of the knowledge base of such a system it seems that a hierarchical structure for the various parts would be best suitable as we argued in [Bi5]. On the bottom we would have the clusters of domain knowledge. On top of these would be what we call deductive knowledge that stores preprocessed deductive information, so that costly search has not to be repeated many times. On the next level we would have meta-level knowledge of judgement. And so forth, until on the top level all is brought together by a 214 central control. There are still so many open problems in important issues of detail for most of the features of reasoning discussed in this paper that it might seem premature to speculate in such a way about a uniform system comprising all these forms of reasoning. But speculation here is meant to play the role of a heuristic that guides our judgement about which of the many problems to attack with higher priority than others. If this paper has contributed to see all these problems in a common context it has fulfilled its purpose.

Acknowledgment. The typscript is due to A. Bentrup and W. Fischer.

REFERENCES [BaF] Barr, A.B., Feigenbaum, E.A. (eds.), The Handbook of Artificial Intelligence, 1, W. Kaufmann, Los Altos (1981). [Bet] Beth, E.W., The foundations of mathematics, North-Holland, Amsterdam (1965). [Bil] Bibel, W., Programmieren in der Sprache der Pr/idikatenlogik, Habilitationsarbeit (abgelehnt), Technische Universit/it Mfinchen (1975); shortened version: Pr/idikatives Program- mieren, LNCS 33, Springer, Berlin, 274-283 (1975). [Bi2] Bibel, W., A uniform approach to programming, Report No. 7633, Technische Universitht Mfinchen, Abtlg. Mathematik (1976). [Bi3] Bibel, W., Automated theorem proving, Vieweg, Braunschweig (1982). [Bi4] Bibel, W., Matings in matrices, CACM 26, 844-852 (1983). [Bi5] Bibel, W., Knowledge representation from a deductive point of view, Proc. I IFAC Symposium Artificial Intelligence (V. M. Ponomaryov, ed.), Pergamon Press, Oxford, 37-48 (1984). [Bi6] Bibel, W., First-order reasoning about knowledge and belief, Proc. Int. Conf. Artificial Intelligence and robotic control systems (I. Plander, ed.), North-Holland, Amsterdam, 9-16 (t984). [Bi7] Bibel, W., Automated inferencing, J. Symbolic Computation 1, 245-260 (1985). [Bi8] Bibel, W., A deductive solution for plan generation, New Generation Computing 4 (1986). [BoK] Bowen, K.A., Kowalski, R., Amalgamating language and meta-language in logic pro- gramming, Logic Programming (K.L. Clark, S.-A. T~irntund, eds.), Academic Press, London, 153-172 (1982). [BOW] Bowen, K.A., Weinberg, T., A recta-level extension of PROLOG, Technical Report, CIS-85-t, Syracuse University (1985). 215

[BdK] Brown, J.S., de K.leer, J., The origin, form, and logic of qualitative physical laws, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 1158-1169 (1984). [Bun] Bundy, A., The computer modelling of mathematical reasoning, Academic Press (1983). [Cla] Clark, K.L., Negation as failure, Logic and Data Bases (H. Gallaire et al., eds.), Ple- num Press, New York, 293-322 (1978). [C1M] Clark, K.L., McCabe, F.G., The control facilities of IC-PROLOG, Expert systems in the Microelectronic Age (D. Michie, ed.), Edinburgh University Press (1979). [Coh] Cohen, P.R., Heuristic reasoning about uncertainty: an Artificial Intelligence approach, Pitman, Boston (1985). [deF] de Finetti, B., Theory of probability, vol. 1, Wiley, London (1974). [Doll Doyle, J., A truth maintenance system, Artificial Intelligence 12, 231-272 (1979). [Do2] Doyle, J., Circumscription and implicit definability, Non-monotonic Reasoning Workshop, AAAI, 57-67 (1984). [DHN] Duda, R.O., Hart, P.E., Nilsson, N.J., Subjective Bayesian methods for rule-based inference systems, Techn. Note 124, SRI International, AI Center, Menlo Park; also: Proc. NCC, AFIPS Press (1976). [EMR] Etherton, D.W., Mercer, R.E., Reiter, R., On the adequacy of predicate cir- cumscription for closed-world reasoning, Proc. Non-monotonic Reasoning Workshop, AAAI, 70-81 (1984). [Fef] Feferman, S., Toward useful type-free theories I, JSL 49, 75-111 (1984). [Gall Gallagher, J., Transforming logic programs by specialising interpreters, Report, Dept. Computer Science, University of Dublin (1984). [GaLl Gallaire, H., Lasserre, C., Meta-level control for logic programming, Logic Program- ming (K.L. Clark, S.-A. T~rnlund, eds.), Academic Press, London (1982). [GeG] Genesereth, M.R., Ginsberg, M.L., Logic Programming, CACM 28, 933-941 (1985). [Gen] Gentzen, G., Untersuchungen fiber das logische Schliessen, Mathem. Zeitschr. 39, t76-210, 405-431 (1935). [Gly] Glymour, C., Independence assumptions and Bayesian updating, Artificial Intelligence 25, 95-99 (1985). [CoS] Cordon, J., Shortliffe, E.H., The Dempster-Shafer theory of evidence and its relevance to expert systems, Rule-based Expert Systems (B.G. Buchanan, E.H. Shorttiffe, eds.), Addison-Wesley, Readings, ch. 13 (1984). [Gre] Green, C.C., Theorem proving by resolution as a basis for question-answering systems, Machine Intelligence 4, Elsevier, New York, 183 - 205 (1969). 216

[Gro] Grosof, B., Default reasoning as circumscription, Proc. Non-monotonic Reasoning Workshop, AAAI, 115-124 (1984). [Haa] Haas, A.R., A syntactic theory of belief and action, Artificial Intelligence 28 (1986). [Hay] Hayes, P.J., Naive physics I - Ontology for liquids, Formal Theories of the Common- sense World (Hobbs, J.R., Moore, R.C., eds.), Ablex (1984). [Hin] Hintikka, J., Knowledge and belief: An introduction to the logic of the two notions, Cornell University Press (1962). [JLL1 Jaffar, J., Lassez, J.-L., Lloyd, J., Completeness of the negation as failure rule, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 500-506 (1983). [Kad] Kadesch, R.R., Subjective inference with multiple evidence, Artificial Intelligence 28 0986). [Kow] Kowalski, R.A., Sergot, M., A logic-based calculus of events, New Generation Com- puting 4, 67-95 (1986). [Kri] Kripke, S., Semantical analysis of modal logic, Zeitschrift f. Mathem. Logik u. Grundlagen der Mathem. 9, 67-96 (1962). [Lev] Levesque, H., A logic of knowledge and active belief, Proc. AAAI-84 (1984). [Lil] Lifschitz, V., Computing circumscription, Proc. IJCAI-85, Kaufmann, Los Altos, 121- 127 (1985). [Li2] Lifschitz, V., On the satisfiability of circumscription, Artificial Intelligence 28, 17-27 (1986), [Llo] Lloyd, J.W., Foundations of logic programming, Springer, Berlin (1984). [Mcl] McCarthy, J., First-order theories of individual concepts and propositions, Expert Sys- tems in the Micro-electronic Age (D. Michie, ed.), Edinburgh University Press, 271-287 (1979). [Me2] McCarthy, J., Circumscription - a form of non-monotonic reasoning, Artificial Intelli- gence 13, 27-39 (t980). [Me3} McCarthy, J., Applications of circumscription to formalizing common sense knowledge, Proc. Non-monotonic Reasoning Workshop, AAAI, 295-324 (1984). [MiP] Minker, J., Perlis, D., Completeness results for circumscription, Artificial Intelligence 28, 29-42 (1986). [Moo] Moore, R.C., Semantical considerations on non-monotonic logic, IJCAI-83 (A. Bundy, ed.), Kaufmann, Los Altos, 272-279 (1983). [Pea] Pearl, J., On evidential reasoning in a hierarchy of hypothesis, Artificial Intelligence 28, 9-16 (1986). 217

[Per] Perlis, D., Languages with self-reference, Artificial Intelligence 25, 301-322 (1985). [Qui] Quinlan, J.R., Internal consistency in plausible reasoning systems, New Generation Computing 3, 157-180 (1985). [Rel] Reiter, R., A logic for default reasoning, Artificial Intelligence 13, 81-132 (1980). [Re2] Reiter, R., Circumscription implies predicate completion (sometimes), Proc. AAAI-82, 418-420 (1982). [Re3] Reiter, R., Towards a logical reconstruction of relational database theory, On Concep- tual Modelling: perspectives from Artificial Intelligence, databases, and programming languages (M.L. Brodie et al., eds.), Springer, Berlin, 191-238 (1983). [Sch] Schtitte, K., Proof theory, Springer, Berlin (1977). [Sha] Shafer, G., A mathematical theory of evidence, Princeton University Press, Princeton (1976). [She] Shepherdson, J.C., Negation as failure: A comparison of Clark's completed data base and Reiter's closed-world assumption, Report PM-84-01, School of Mathematics, University of Bristol (1984). [Silo] Shoham, Y., Ten requirements for a theory of change, New Generation Computing 5, 467-477 (1985). [Tur] Turner, R., Logics for Artificial Intelligence, E. Horwood, Chichester (1984). [Wey] Weyrauch, R., Prolegomena to a theory of mechanized formal reasoning, Artificial Intelligence 13, 133-197 (1980). [Win] Winston, P.H., Learning and reasoning by analogy, CACM 23, 689-703 (1979) . [Zal] Zadeh, L.A., A computational approach to fuzzy quantifiers in natural languages, Comp. & Maths. with Appls. 9, 149-184 (1983). [Za2] Zadeh, L.A., The role of fuzzy logic in the management of uncertainty in expert sys- tems, Fuzzy Sets and Systems 11, 199-227 (1983). PART THREE

Knowledge Programming Term Rewriting as a Basis for the Design of a Functional and Parallel Programming Language.

A case study : the Language FP2

Philippe Jorrand

LIFIA Institut National Polytechnique de Grenoble

FOREWORD

The semantic elegance and the mathematical properties of applicative and functional programming languages are now widely recognized as relevant and useful qualities for implementing the large and complex algorithms of the kind encountered in artificial intelligence.

On the other hand, it is also being realized that many of the problems solved by these algorithms, like automated reasoning, and some major application areas related to artificial intelligence, like computer vision, are in fact considered in a distorted and limited way because of the implicit and ever present hypothesis that they have to be solved in a sequential way by a single processing engine. This is one reason why parallelism has become a highly active topic for research in programming languages and methodology. Another reason is that the design of machine architectures with massive parallelism is also becoming a feasable task because of the progress in VLSI technology.

However, the history of languages has put parallelism, communication and synchronization on a separate path from nice and clean applicative and functional programming. One difficulty is then to reconcile these seemingly antagonist styles. 222

It is such a unified framework for both functional and parallel programming which is presented here. It takes the form of a language, called FP2, which is entirely based on the notion of terms for representing the objects of the language, and on the mechanics of term rewriting for representing the operational semantics.

Part of the work on FP2 is carried out in the context of ESPRIT Project 415, where FP2 is the basic tool for designing and implementing a parallel inference machine. This presentation is partly drawn from : "FP2 : the language and its formal definition", a working document written for ESPRIT Project 415. This presentation has the format of an informal language description and it does not contain references inserted in the text. It is followed by a bibliography on topics related to the essential questions raised by the design of such a language. 223

I _ OVERVIEW.

The main language styles under active study for new generation programming can be visualised on a triangle. Vertices represent "pure" programming styles (i. e. functional, parallel and logic). The edges represent "mixed" programming styles, where 2 "pure" styles are explicitly present in a single language.

9),,, unct.ionalpro~: :, ....ammlno

/" ",.,,%. FP2 ",\. / .)/" "-,

¥ - ") Parallel programming Logic programming

FP2 is on the edge joining functional programming and parallel programming. It must be distinguished from other functional languages where data flow is used for taking advantage of possible parallelism during evaluation.

In FP2, on the contrary, both functional programming and parallel oro2ramminl are explicitly present and can be independently expressed using specific constructs in the language. Furthermore, FP2 is a typed language allowing polymorphic algebraic type definitions, polymorphic functions and polymorphic communicating processes. Finally, the "declarative" style of FP2 and its semantics give to that language the qualities of both a programming language and a specification language.

The semantics of FP2 rely on term algebras and on rewrite systems. This establishes a sound basis for designing and implementing formal verification tools, like full static type checking, static analysis of dynamic behavior (deadlocks, livelocks.... ) and proof of implementation correctness (comparison of a specification in FP2 with an implementation, also in FP2). 224

The main characteristics of FP2 can be summarized as follows "

FP2 is a functional programming language. Values in FP2 are represented by terms and basic function definitions have the form of rewrite rules. Function applications are terms containing defined function names : rewrite rules reduce function applications to terms containing no function application. Functional forms using second order functional operators and function names can be written and named : such higher form for constructing and defining functions has its semantics defined in terms of basic function definitions (i. e. rewrite rules).

FP2 is a parallel programming language. Independant communicating processes can be defined and networks of them can be constructed. A process is able to send and to receive messages to and from its environment. These messages How through ports owned by the process. Messages are values : they are represented by terms, they are built and reduced according to functional programming in FP2. Describing a process requires both describing the possible orders in which its ports may be used (sequentiality, non determinism, simultaneity) and describing sent messages by applying functions to received messages : basic process definitions accomplish all of this within a single formalism, namely rewrite rules. "Process forms" using "process operators" and process names can be written and named : such higher level form for constructing and defining processes denotes, in general, a network of processes (e. g. systolic arrays) and has its semantics defined in terms of basic process definitions (i. e. rewrite rules).

FP2 is a typed programmin~ language. Every term representing a value in FP2 is typed and every function has a domain and a range defined by types. All terms of a given type are thus results of applying functions having their range in that type • on that basis, types in FP2 are defined as term algebras. A type definition introduces the constructor operations for objects of that type ' terms containing only constructors are normal forms for terms containing function applications. Elaborate type structures built by means of "type forms" using "types operators" and types names can be written and named. 225

FP2 is a oolvmorDhic pro~rammin~ language. Type definitions, function definitions and process definitions may be parameterized by types : such definitions are called "polymorphic". In order to guarantee the type correctness of polymorphic definitions and for establishing the proper bindings, a notion of "property" is introduced : a property characterizes a class of types by defining a minimal algebraic structure that all the types of the class must have. Arbitrary properties may be defined. Once a property is defined, it may be used for specifying the class of actual types a formal type parameter of a polymorphic definition may be bound to.

FP2 is a modular programming language. Function, process, type and property definitions can be grouped inside "modules" which can be assembled in a hierarchical manner. Modules may export definitions to ascendent modules and may hide definitions to descendent modules. Modules form the basis for a strict control of visibility within FP2 programs. 226

2 _ TYPES,

Values are represented by terms and FP2 operates on values by applying functions to terms. There are two kinds of functions : constructors and operations. Terms containing operation applications should always be reduceable to terms containing only constructors. The reductions corresponding to a given operation are defined by rewrite rules.

Terms are typed and every function has a domain and a range, both of which are types. Thus, terms of a given type are all results of applying functions ranging in that type. Formally, types are term algebras and, with the rules defining operations, every term is congruent to a term containing only constructors.

The basic form of type definition provides a signature for constructors and, possibly, for operations involving objects of the type. It also provides the rules defining the reductions for these operations.

In addition to this reasonnably classical basis for algebraic type definition, FP2 also provides ways of constructing new types from existing types, using cartesian product, sequence and union type building "operators".

2. I _ Basic type definitions,

A basic type definition presents a term algebra. It provides :

- The name t of the type ;

- The names, domains and ranges (necessarily t) of the constructors of t ;

- The names, domains and ranges of operations involving t ; - The names and types of variables used in the rules for operations ; Rewrite rules defining the reduction of terms containing operation applications.

The left and right members of rules are separated by "==>" signs: Rules should be written in such a way that terms containing operation applications can be reduced to terms containing only constructors : this is an important question which has been studied in a number of places, especially in connection with algebraic data types and with terms rewriting systems. It will not be discussed here. 227

An example is the type "Nat" of Natural integers (assuming that the type Bool of booleans is defined with constructors "true" and "false", and with operations "or", "and" and "not") •

Nat cons 0 • -) Nat succ • Nat -) Nat oons add, mul " Nat×Nat=) Nat eq, leq " Nat × Nat -) Bool max • Nat×Nat=) Nat I " -) Nat VarS m, n • Nat rules add(0,n) ==> n add(succ(m),n) ==> succ(add(m,n)) mul(0,n) ==> 0 mul(succ(m),n) ==> add(mul(m,n),n) leq(0,m) ==> true leq(succ(m),0) ==> false leq(succ(m),succ(n)) ==> leq(m,n) eq(m,n) ==> and(leq(m,n),leq(n,m)) max(re,n) ==> if leq(m,n) then n e.lsem endif I ==> suce(0) endtype

This example should not imply that integers have to be represented as succ(succ(...)) - the usual decimal notation and infix arithmetic operators can also be used. This is also the case for <, ( .... and the boolean operators. 228

As another example, binary trees with natural integers at their leaves can be described by "

type Btree cons tip : Nat -) Btree fork : Btree xBtree -~ Btree 0PnS maxt : Btree -) Nat vats m : Nat u, v : Btree ru~Is maxt(tip(m)) ==> m maxt(fork(u,v)) ==> max(maxt(u),maxt(v)} endtvDe

The tree pictured as -

. \

/ '... f 1"T-) /, "-.,,.. \

j< \',%,.,

,.U d~

would be constructed by •

fork(tip(3), fork(fork(tip( I },tip(4)), tip(2)})

A general method for writing rules is that the left members apply each defined operations to disjoint cases of constructors, whereas the right members may have any format including conditional expressions. 229

2.2 _ Type forms.

In addition to elementary types defined by means of basic type definitions, it is possible to define constructed types by means of type forms where the operands are type names and the operators are type operators.

There are three such operators :

I _ Cartesian product. If tl,t2,...,t n are types then tl×te×...xtn is also a type, the cartesian product of tl,t2,,_,t n. If Xl,X2,...,X n denote objects of types tl,t2,...,t n respectively, then (xl,x2,...,x n) denotes an object of type t I × tax...× t n. It must be noted that t i ×t2×t 3, (t I ×t~)×t~ and t I ×(taxt 3) are distinct types.

2 _ Sequence. If t is a type, then t* is the type of sequences with elements of type t. If x denotes an object of type t and if s denotes a sequence of type t*, then x.s denotes a sequence of type t*. The notation nil : t* denotes the empty sequence of type t'. It is simply written nil when the type t* is known from the context. If Xl,X2,...,X n denote objects of type t, then [xl,x2,...,x J denotes the sequence of type t* constructed by x t.(x2.(....(xn.nil)...)).

3 - Union. If tl,ta,...,t n are types then tlit21...It n is the union of types tl,t2,...,t n. There is no special way of denoting an object of a union type : when in a context where an object of type tllt2l...Itn is required, then an object of type t i, or an object of type t 2, or .... or an object of type t n may be provided. The types tllt2lt3, (tllt2)lt 3 and tll(t2lt 3) are all equivalent, and tilt I is the same as t v 230

Type expressions can be used in any context where a type may be written ; examples of this have already appeared above with functions having cartesian products as their domains.

It is also possible to define names standing for type expressions, by means of type declaration, like in •

type Snat is Nat* type Ssnat is (Nat*)*

For example, a way of describing trees of variable arity with natural numbers or booleans at every node could be •

type Vtree is (NatlBool) × Vtree*

Given a 'type t=tiltaL.It =, m/>l and a type t'=t'llt'21_.lt' n, n/>l, where tt,t2,...,t,n,t'1,t'2,.o.,t' n are not union types, then t is compatible with t' if, for all i, there is a j such that ti=t' j , where "=" is syntactic equality, modulo type declarations. The notation " t =_ t' " stands for " t is compatible with t' " Thus, given the type declaration •

type t is u, both relations t ~_ u and u c_ t hold. 231

5 - FUNCTIONS.

Operations may be defined within basic type definitions. But once a type is defined, additional operations on it may be defined separately. The elementary form of operation definition, called "basic operation definition", follows the same general approach as operations defined within types, namely rewrite rules.

In addition to basic operation definitions, FP2 also allows the construction of other operations, by means of second order functional forms which apply functional operators to function names.

5. I _ Basic operation definitions...,.

A basic operation definition provides : the name, domain and range of the new operation ; the names and types of variables used in the rules for that operation ; the rules defining the reductions of terms containing applications of that operation.

For example, a new operation on Nat's can be introduced by •

oi:) min • Nat×Nat -) Nat vats m,n • Nat rules min(m,n) ==> if m,

An operation replacing all natural integers at the leaves of a Btree t by maxt(t) would be •

912 repmax • Btree -) Btree vars t • Btree rules repmax(t) ==> rep(t,maxt(t)) endoo

o_~ rep • Btree x Nat -) Btree vats m,n • Nat u,v • Btree rules rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n)) endop 232

Another, more elaborate example, shows a way of programming unification of terms in FP2. Terms are structured objects like t and u below •

t. / .. .,./" x'%L / \, /. '\ /" ,,.. / XV x, / ,"~.fz,.<, " i f~/"\ ,," ',a4

/ ",, . /" "\, . - ,.:..

2 ./'" >' ~'".v / / """x / // """\. / " '\" a al ~ ~":J' 2 v4) al 3

The labels fi represent binary function names, a i represent constant names and v i represent variable names. For simplifying the example, the algorithm assumes that each variable name appears at most once in (t,u). The result of unifying t and u is computed by unify(t,u). In the case of t and u above it succeeds and results in a sequence of assignments to variables denoted by : [( i ' ) *"a t7 ") 2 '4 There are other cases, where unification results in a failure. The types for terms can be defined as follows :

type Term is Const I Vat I Applic

type Const cons a - Nat -) Const endtype

Woe Var cons v - Nat -) Var endtype

type Applic is Funct × Term x Term

type Funct cons f " Nat =) Funct endtype 233

The possible results of unification have the type :

type Result is Assign* [ Failure

type Assign is Var x Term

type Failure cons fail : =) Failure endtvDe

While unifying two terms, it will be necessary to combine the results of unifying subterms. This is accomplished by an operation "+" :

o_~ + : Result x Result -) Result vats r : Result u,v : Assign* rules r+fail ==> fail fail+ u ==> fail u + v ==> append(u,v) endoo

This basic operation definition shows an example of operation overloading : the operation +, which is already defined on Nat's, gets here another definition attached to it. When + is applied, the choice among these definitions is determined by the types of the operands. Definitions leading to possible ambiguities are not permitted.

This definition also shows a use of union types : when + is applied on Results, each of its operands may be of type Assign* or of type Failure : the case analysis on operand types is made by the type of variables used in the left members of the rules. 234

Finally, the unification operation is '

o_~ unify ' Term × Term -) Result vars t,u,v,w • Term i, j • Nat c • Const x • Var h • Applic ru~s unify(a(i),a(j)) ==> if i=j then nil else fail endif unify(c,x) ==> [(x,c)] unify(c,h) ==> fail unify(x,t) ==> [(x,t)] unify(h,c) ==> fail unify(h,x) ==> [(x,h)] unify((f(i),t,u), (f(j),v,w)) ==> if_. i=j then unify(t,v) + unify(u,w) els___~efail endif endop

3.2 - Functional forms.

FP2 provides functional operators for combining defined functions into functional forms.

Let r, s, t, t I, t 2 ..... t a be types. There are eight such operators •

I _ Composition. If f and garefunctionswithf:t-) r andg:r =) s, then (gof) is a functional form denoting an operation in t --) s. If x is a term of type t, then the reduction of (gof)(x) is the reduction of g(f(x)).

2 _ Condition. If p, f and g are functions with p • t -) Bool, f - t --) r and g • t -) r, then (p=) f ; g) is a functional form denoting an operation in t -) r. If x is a term of type t, then the reduction of (p =) f ; g)(x) is the reduction of if p(x) then f(x) else g(x) endif. 235

3 - ~artesian Product construction. If fl, f~...... fn are functions with fi : t =) t i, then (fl, f2 ..... fn) is a functional form denoting an operation in t ~ t I x t 2 x ... x t n. If x is a term of type t, then the reduction of (fl,f2,...~a)(X) is the reduction of (fl (x),f2(x),...Jrn(x)).

4 _ Seauence construction. If fl, fa ..... fn are functions with fj : t -) r, then If j, f~...... fn] is a functional form denoting an operation in t -) r*. If x is a term of type t, then the reduction of Ill,r2, ..., fn](x) is the reduction of [f)(x),f2(x),..., fn(x)]. If n=O, then the reduction of [](x) is the reduction of lnill(x).

5 - Constant. If x is a term of type t, then Ixl is a functional form denoting an operation in r-)t, for any type r. If y is a term of any type r, then the reduction of lx|(y) is the reduction of x.

6 _ Map. If f is a function with f : t =) r, then o((f) is a functional form denoting an operation in t* -) r*. If x is of type t*, then there are two cases : (i) if x reduces to nil then 0((f)(x) reduces to nil, and (ii) if x reduces to u.y then the reduction of 0{(f)(x) is the reduction of f(u).0((f)(y).

7 _ Insert. If f is a function with f : t x t -) t, then l(f) is a functional form denoting an operation in t* -) t. If x is a non empty sequence of type t*, then there are two cases : (i) if x reduces to u.nil then/(f)(x) reduces to u, and (U) if x reduces to u.(v.y) then the reduction of /(f)(x) is the reduction of f(u,l(f)(v.y}). If x is the empty sequence of type t*, then l(f)(x) raises the exception "inserLerror".

If f is a function with f: tlxtax...xtn-) t, then : f(xl, x 2 .... , xn), where xj is either a term of type tj or a ".", is a functional form denoting an operation in ti×tjx...×tl-)t, where ti/tjx...xt I is obtained by keeping in t~ xta×...xt n the t k such that x k is a ".". If Yk is a term of type t k such that x k is a ".", then the reduction of : f(xl,x2,...,xn)(y m..... yp) is the reduction of f(z~,z2,...,z n) where z i = x i if x i is a term and z i = Yi otherwise. 236

The semantics of functional operators are defined by considering that functional forms are second order expressions which can be "evaluated". This is possible in FP2, where basic operation definitions constitute a more elementary form of operation description : evaluating functional forms means producing basic operation definitions. Let F(x) be a term where F is a functional form. The evaluation of F is guided by its syntax : dummy operation names f0,fp.., are generated, one for each syntactical sub-form in F, where f0 is the operation for F itself. Then, basic operation definitions for f0,fl ..... with their respective names, domains, ranges, variables and equations can be mechanically produced. In fact, defining one of these functions fi is necessary only when fi corresponds to a map or insert functional operator and in the case of recursive functional forms. Finally, F(x) is replaced by f0(x) which has its reduction defined by the generated basic operation definition.

For example, given the following basic operation definition •

o_R null : Nat* -~ Bool vars m : Nat s : Nat* rules null(nil) ==> true null(re.s) ==> false endoR

The evaluation of ((not o null) =) l(add);101) produces "

o_O_R f0 " Nat* -) Nat vats v 0 " Nat ~ rules f0(v0) ==> ~_ not (null(v0)) then f! (v 0) else 0 endif endo~ 237

fl : Nat" -) Nat vats v o,v I : Nat v 2 : Nat" rules fl(v0.nil) ==> v 0 fl(Vo.(vl.V2)) ==> add(vo, fl(vl.v2)) fl(nil) ==> ! insert_error endop

If x is a sequence of type Nat*, then ((not 0 null) =) l(add);101)(x) reduces to the sum of the elements of x, or to 0 if x is nil.

Functional forms can be used in any context where function names may be written. It is also possible to define names standing for functions built by functional forms •

9J~ sigma is ((not o null) ~ l(add);101) o__p_pi is ((not 0 null) -) l(mul);10l) 0_~ sigpi is (sigma 0 o((pi))

Recursive operation definitions with functional forms fit quite naturally in that framework. For example, given -

oo II is (leq 0 (id,lll)) where id is the identity function on natural numbers, and "

pred - Nat -~ Nat vars m • Nat rules pred(0) ==> 0 pred(succ(m)) ==> m e,~dop

re_p_p I is pred

p2 is (pred opred) 238

Fibonacci numbers can be computed by •

fib is (II =) id ; (add o ((fib o pl),(fib o p2))))

This definition produces •

oo fib Nat -) Nat ¥~r~ v 0 • Nat rules fib(v 0) ==> if ll(vo) then id(vo) else add(fib(pI (vo)),fib(p2(vo))) endif endoo 239

4 _ PROCESSES.

The elementary component for organizing parallel computations in FP2 is the process. A process has ~)orts through which messages may flow in and out. Messages are values, they are represented by functional terms, they have types and they are built and reduced according to functional programming in FP2. Messages arrive at ports or leave ports along directed connectors having their destinations or their origins attached to these ports. Each connector allows messages of a certain type, which may be a union type. The transportation of one message along one connector is a communication. There is no such notion like the duration of a communication.

Describing a process is describing its ability to perform communications along the connectors attached to its ports. In addition to applying functions to received messages for computing sent messages, this involves also seauencin~, n~ determinism and parallelism in the ordering among communications. Formally, a process is a state transition system which can be viewed as a graph : nodes represent states, multiple branching represents non determinism and arcs are labelled by events, where an event is a set of communications occurring in parallel. This graph is in general infinite and every path represents a possible sequence of events : one set after the other of communications occurring in parallel.

The basic form of process definition provides a description of the connectors of the process and it makes use of rewrite rules for describing the non deterministic state transition systems : the rules, labelled by (possibly empty) events, rewrite states. In addition to basic process definitions, the language allows definitions of processes built by combining other processes into process forms by means of process operators.

4. I ~ Basic process definitions.

A basic process definition describes a transition system, with transition rul~s, where the events are sets of communications along typed connectors. It provides • the name N of the process ; the names and message types of the input, output and internal connectors of N ; the names and domains of state constructors used in the rules of N ; the names and types of variables used in the rules of N ; rules defining the transitions of N. 240

As an example, let STACK be a process. It can be pictured as :

----• t 01 ~-

It has an input connector I and an output connector O. The communication of a message v, where v is a functional term of type t, along a connector k of message type t is denoted by k(v). For example, if both I and 0 may communicate Nat's, then I(O) and O(succ(O)) denote communications. An event is composed of a set of corn munications k ~(v j)...kn(vn), where k ~..... k n are n distinct connectors.

A term of the form Q(u i .....urn), where Q is a state constructor and where the ui's are functional terms of the correct types for the domain of Q, is called a predicate. A predicate without variables in the ui's is a stale. State constructors cannot appear in the ui's.

Rules are composed of three parts: a predicate R(u i .....urn) called the pre-condition, an event ki(vl)...kn(v n) and a predicate S(wl,...,wn) called the post-condition. They have the general format :

R(ui..... u,,) : kl(vl)...kn(vn) ==> S(wl,...,wp)

If k i is an internal or output connector, all variables appearing in v i must appear in R(u I .....urn) or in vj such that kj is an input connector. The same must be true for the variables in S(w i .....wp). Since an event is a set it may be empty. In that case, the rule is an internal rule, of the form :

R(u I .....u m) ==> S(wI,o..,wp)

Furthermore, among the rules of a process, there must be at least one initial rule, without pre-condition and without event, and where the post-condition is a state :

-=> S(wl,._,w p) 241

For example, let the process STACK be an unbounded stack of Nat's. It is initially empty and when a Nat arrives along I, it may be written into STACK. When STACK is not empty, the last arrived Nat may be read from it along O. Writing and reading are mutually exclusive. A basic process definition for this "Last In First Out" STACK may then be :

proc STACK i~ I -Nat o¢~t 0 "Nat states S " Nat* vats e • Nat v • Nat* rules ==> S(nil) S(v) - l(e) ==> S(e.v) S(e.v) • O(e) ==> S(v) endproc

Rules in a process N describe a transition system in the following way •

0 _ Initially, one of the initial rules in N is chosen. This choice is non deterministic, The post-condition of the chosen rule becomes the current state of N. Then repeat steps I, 2 and 3.

I _ The current state q of N is matched against the pre-conditions of the rules : a rule with pre-condition r is said to be Dre-aoolicable if there exists a substitution h for the variables of r such that h(r)=q. If there is no pre-applicable rule, the process is terminated.

2 _ Let e be the event in the pre-applicable rule and let mj be a message about to be sent accross kj, for all ki(v i) in e where kj is an external connector. That rule is said to be applicable if there exists a substitution g for the variables of all vj's such that g(h(vi))=m j.

3 - One of the applicable rules is chosen to be the at)t~lied rule. This choice is non deterministic. Let s be the post.condition of this rule. The event g(h(e)) occurs and the term g(h(s)) becomes the current state of N. 242

This operational view shows how rules express sequencing, non determinism and parallelism among communications : rules are applied one at a time (sequencing), the applied rule is chosen among several applicable rules (non determinism) and several communications occur within a single event (parallelism). It must be noted that internal rules can be used for describing computations. This can be seen in the following example : MAXNAT sends through C the maximum of two previously entered Nat's, the first one entered through A and the second one through B C-" denotes the null arity for state constructors) " proc MAXNAT in A, B • Nat out C • Nat states X y Nat Z Nat x Nat x Nat × Nat vars m, n, p, q • Nat rules ==> X X • A(m) ==> Y(m) Y(m) • B(n) ==> Z(m,n,m,n) Z(m,n,succ(p),succ(q)) ==> Z(m,n,m,n) Z(m,n;p,0) • C(m) ==> X Z(m,n,0,q) • C(n) ==> X endt)roc

In fact, this form of process definition could very well do without the operation definitions of the functional part of FP2. Assuming that the available functions are only constructors, basic process definitions are sufficiently powerful to define any function that can be computed on a Turing machine. However, defined operations make basic process definitions much easier to write and to read. For example, a process sending out the maximum of its two input messages could also be described as follows : proc MAX in A, B " Nat ou__Xt C • Nat states X • - Y, Z " Nat vars m, n " Nat rule, s, ==> X X : A(m) ==> Y(m) Y(m) : B(n) ==> Z(max(m,n)) Z(m) : C(m) ==> X endproc 243

Process definitions may also be parameterized. For example, bounded queues of natural numbers of capacity k may be defined as follows •

proc BQUEUE [k" Nat] i__~n W • Nat out R ' Nat states Q • Nat* xNat* × Nat vars e • Nat t, u, v • Nat* n • Nat rules ==> Q(nil,nil,k) Q(u,v,succ(n)) : W(e)==> Q(e.u,v,n) Q(u,e.v,n) : R(e) ==> Q(u,v,n+l) Q(e.u,nil,n) ==> Q(nil,reverse(e.u),n) endproc where reverse(s) returns a sequence with the elements of s in the opposite order.

Once such a parameterized process has been defined, it can be instantiated with actual parameters. It is also possible to define names standing for processes •

proc BQUEUE4 is BQUEUE[4]

Every process definition, with or without parameters, can also be considered as the definition of an indexed family of processes, where the indexes are natural numbers. For example, let processes V be variables alternating write and read communications :

V in W : Nat out R : Nat states E : - F : Nat vars v : Nat rules E E : W(v)==> F(v) F(v) : R(v) ==> E endproc 244

This definition also defines processes V_l, V_2, etc .... with connectors W_I and R_I, W_2 and R_2, etc... These indexes may also appear as parameters, like in -

proc VNAT [i-Nat] is V_i

Then •

V3 is V_3 and "

rp=f_OS_ V3 is VNAT[3] are identical definitions and produce a process with connectors W_3 and R_3. That process may in turn be considered as defining an indexed family V3_I, V3_2, etc...

A similar indexing facility can also be used within basic process definitions, when it is necessary to describe processes with indexed families of connectors, states, variables, rules or events. For example, a process ONE receiving a Nat into I and sending it out from on_.__~eof its n output connectors O_i, is described by -

p_~oc ONE [n" Nat] i~ I • Nat ou___~t {O_i [i= l..n) " Nat states E • - F " Nat vars v • Nat rules ==> E E ' I(v) ==> F(v) ( F(v) • O_i(v) ==> E l i=l..n endproc

t

I

0_1 O_2 C)_t'i ,,1 I J, ,L 4, 245

Given an instantiation ONE[3], the repetition facility {O_i I i=1_n} • Nat stands for • 0_1,0_2,0_3 • Nat.

Similarly, 3 rules are produced, one for output through each of 0_I, 0_2, 0_3.

An other example shows the use of this facility for describing a process ALL which receives a Nat into I and sends it out from all of its n output connectors within the same event •

proc ALL In" Nat] in I -Nat out {O_ili=I..n} " Nat states E - - F • Nat vats v " Nat rules ==> E E - I(v) ==> F(v) F(v) • {O_i(v) li=l..n} ==> E endproc

A process which receives n Nat's sequentially into its input connectors I_i taken in any order and then sends out their maximum through 0 is defined by •

MAXALL [n Nat] in {IJ]i=1..n} • Nat out 0 • Nat sta)es Q " Nat x Nat* vars v, m • Nat s • Nat* rules ==> Q(n,nil) { 0(succ(m),s) : I_i(v) ==> Q(m,v.s) ] i=l..n Q(0,v.s) : O(/(max)(v.s)) ==> Q(n,nil) Q(O,nil) : 0(0) ==> Q(n,nil) endproc

I I_1 I_2 I_n

@ 246

Finally, process definitions may also parameterized by port names, as in the following definition of a "CELL". A CELL performs four communications within a single event. In that event, it inputs natural integer values, while sending out the result of a simple computation performed on previously received values :

proc CELL [c. Nat] [X0,Yo,X I,Y 1 "Port] in Xo, Yo • Nat ou.t XI, Y1 - Nat states Q • Nat x Nat vats x, y, u, v - Nat rules ==> Q(O,O) Q(x,y) • X0(u) Y0(v) Xl(x) Yl(y+c*x) ==> Q(u,v) endp,roc

Given natural integers a, i and j, and identifiers U, V, X and Y, CELL could be instantiated as follows •

proc CELLI is C~L[al[U, Y_i_j, X i. j, V].

¥-J-i

x_i_j U

V i 247

4.2 _ Process forms.

FP2 provides process operators for combining defined processes into process forms. The number and the nature of these operators are arbitrary and a given implementation of the language could take any collection of them. The important facts are : (a) all operators are built up on top of a common primitive basis ; (b) process forms can all be evaluated into basic process definitions with connectors, state constructors, variables and rules.

4.2. I _ Primitive basis for process operators.

A non parameterized basic process definition is a syntactic object of the form : proc N connectors kl : t I ... k I : t 1 states Pl : Jl ". Pro: lm

vats V 1 : U 1 ,.. V n : ti n rules ==> ql -.- ==> qp

r i e I ==> S l ... fq eq ==> Sq endproc where the input, output and internal connectors have been grouped within a single list. Given a connector k i, its sort is given by sort(ki) E {in. ~ internal}.

Thus, a basic process definition can be viewed as associating a process name N with a tuple composed of the five following sets :

- K = { { i=l..l }, represents connector definitions, where the ki's are I distinct connector names and the ti's are types. "P = { i i=l..m ), represents state constructor definitions, where the pi's are m distinct state constructor names and the li's are (possibly empty) lists of types. V = { { i=l..n }, represents variable definitions, where the vi's are n distinct variable names and the ui's are types.

- Q = { qi { i= 1..p }, represents initial rules, where the qi's are states.

- R = { } i= l_q }, represents transition rules, where the ri's and si's are predicates and the ei's are (possibly empty) events. 248

Let N = and N' = be two basic process definitions -

Primitive sum operators. Iff K and K' share no port name, the sum K+K' is the set union of K and K'. It is undefined otherwise. Analogously, P+P' is the union of P and P' iff they share no state constructor name, V+V' is the union of V and V' iff they share no variable name and e+e', where e and e' are events, is the union of e and e' iff they share no connector name. The sums Q+Q' and R+R' are the corresponding set unions and are always defined.

Product of predicates. Given state constructors definitions d= and d'=, the product d*d' is a state constructor definition , where p_p' is a state constructor name and where M (p, p', I, I') is a list obtained by merging the lists 1 and I' in a way depending solely on p and p' and such that M(p,p',l,l')=M(p',p,l',l). Given P = { d i I i=l..m } and P' = { d'j ] j=l..m' }, the product P*P' is {di*d'i ] i=l..m, j=l..m' }. Iff r = p(f) and r' = p'(f') are predicates such that the lists of functional terms f and f' share no variable name, the product r*r' is defined and is p_p' ( M (p, p', f, f')).

Product of initial rules. Given Q = C qi J i= 1_p } and Q' = { q'j ] j= I ..p' }, the product Q*Q' is { qi*q'j I i=l..p, j=l..p' }.

Product of transition rules. Iff R={ ] i= l..q } and R'={ I j= I ..q' } share no variable name and no port name, the product R*R' is defined and is { i~ I i=l ..q, j=l..q' }.

Idle rules. Given a state constructor definition with I = (tl,..,tk), j (p) is a predicate of the form p (xl,..,x k) where the xi's are k distinct variable names and x i is of type tj. Given P = { I i=l..m }, the set I (P) = { I r ~ J (p) }, where J (p) = { j (pi) ] i=l..m }, is the set of "idle rules" on P and W (P) is a set of variable definitions necessary for I (P).

4.2.2 _ process operators.

Let NI = and N 2 = be basic process definitions where K l and K2 do not share any port name. In each definition D of a process operator, N l and N~. are such that the primitive operators applied in D are defined on their operands : this can be assumed without any loss of generality, since state constructors and variables may always be renamed. 249

I _ Interleaved composition • Nt_J_.Nz When an event occurs in the process N t l N 2, it is either an event in N i while N a is idle or an event in N 2 while N t is idle •

NIIN 2 =

where - K = Ki+K 2 P = Pl*P2 V= V t+V 2+w(pi)+w(p2) O = Qt*Q2 R = Rt*I(Pa)+I(Pt)*R a

2 _ Synchronous composition • Nt__lilN~ When an event occurs in N t Ill N 2, it is an event in N i together with an event in N 2 •

NI]IiN 2 =

where • K = KI+K z P = Pt*P2 V= VI+V 2

Q = QI *Q2 R = RI*R a

3 - Parallel composition • Nt_ H N 2, When an event occurs in N i II N a, it is an event in N l while N 2 is idle or an event in N 2 while N t is idle, or an event in N! together with an event in N 2 -

N lllN 2 =

where - K = Kt+K 2 P = Pt*P2 V = Vi+V 2+W(PI)+W(P2)

Q = Ot *O2 R = RI * I (P2) + I (Pt) * R 2 + R! * R 2 250

4 _ Uncontrolable choice " NI_ .~ N2._ At initialization, the process N l ? N2 chooses non deterministicaUy to behave always like N I while leaving N 2 idle, or to behave always like Na while leaving N~ idle :

N!?N 2 =

where • K = KI+K 2 P = P~+P2 V= Vt+V 2 O = Qi+O2 R = RI+R 2

5 - Controlable choice : N! ! N~_. After initialization', the first event to occur in N l ! N2 may be either an event in N I while N 2 is idle or an event in N 2 while N l is idle. If that event contains input or output communications, the choice may be controlled by the environment of N I ) N 2. After that event has occurred, N I ! N x continues to behave like N I while leaving N 2 idle if it was an event in N l, or continues to behave like N2 while leaving Nl idle if it was an event in N 2 :

N!!N 2 =

where • K I +K 2 p = (pj * p2) * p,

V = V I +V 2 Q = (QI * Q2 ) * O' R = (R l * I (P2)) * R' l + (I (Pi) * R 2) * R' 2

given " { , , } {T} R' I = { , } R'2= { , } 251

.6 _ Connection : N ! + A.B. Let A be an output connector of type t I and B be an input connector of type t 2, both in N v If there exists a type t with t ~ t I and t _m t2 such that there is no t' ~ t satisfying these same conditions and t _: t', then A.B is an internal connector of type t in N I + A.B. The process N l + A.B behaves like N l and, in addition, when an event involving both A and B may occur in N l, a new event involving A.B may occur in N! + A.B, where the message sent from A arrives at B NI + A.B = where • K = KI+K' P= Pl V= V l 0 = Ol R = RI+R' given • K' = { } R'= { g() i aR I and A(u) ee and B(v) ae and g = mgu(u,v) } with : mgu (u, v) = most general unifier of u and v. 7 _ Hiding : N ! - k. If k is a connector of type t in N l, the connectors of N l - k are all the connectors of N l, except k which is "hidden". if k is an external connector (i. e. input or output connector), no event involving k may occur in N i - k, since the environment cannot "see" k any longer. If k is an internal connector, events involving k in N I may still occur but, in Nj - k, they do not mention k any longer : Nl-k = where : K = Kj-K' P= Pl V= V l 0 = Ol R = R I - R' ÷ R" given : K'= { } R'= { I e R l and e involves k ) R,, = {l • R i and k(u) e e and k is internal} 252

8 _ Tri~er : e -> N!. If e is an event, the first event to occur m e -> N l may occur only together with e. After that, e -> N I behaves like N v Thus, e -> N l is N t triggered by e :

e->N t =

where : K = Kt+K' P = pl*p' V= V t O = ql* Q' R= RI*R'

given : K' = { a set of connector definitions necessary for e } P' = { , } Q'= {T} R' = { , }

9 _ Control :e => ..N!. If e is an event, e => N l behaves like N v but every event occurs together with e. Thus, e => N l is N! controlled by e :

e=>Ni =

where : K = Kt+K'

p = pt*p' V= V 1 Q = QI* Q' R = RI*R'

given : K' = { a set of connector definitions necessary for e ) P' = ( } O'= {T} R'= ( } 253

I0 _Time-out : N!.n:N2_~ If n is a natural integer, then NI.n:N 2 behaves like N! for at most n successive events. If N! is terminated before n events have occurred in it, then NI.n:N 2 is also terminated. If N! is not terminated at that time, then N!.n:N 2 stops behaving like N! and starts behaving like N 2 :

NI.n:N 2 =

where " K = KI+K 2 P = Pl*P'+P2 V = VI+V'+V",V 2 Q = Ql* Q' R = RI*R'+R"+R 2

given • P' = { } V' = { } V" = W (Pl) Q' = {T(n)} R' = { } R" = {IpEJ(Pl)andqeQ2}

Process forms appear in the context of process definitions • proc i_s

A process form is an expression in which the operators are process operators and the operands are processes, connectors, events or natural integers. Evaluating a process form results in basic process definitions.

In principle, the evaluation of a process form N is guided by its syntax " dummy names no, n! ..... are generated, one for each syntactical sub-form in N, where n o is the name corresponding to N itself. Then, basic process definitions for n o, n~ ..... with their respective names, connectors, predicates, variables and rules can be mechanically produced by applying the definitions of the operators.

In practice, this evaluation can be greatly optimized and most intermediate basic process definitions (especially those resulting from compositions) can be avoided. 254

4.2,3 _ Examples of process form&s

For writing the examples in this section and showing the results of some process form evaluations, the following conventions are used •

- Process forms written "N ++ A.B", where N is a process form and A.B is an internal connector, are expanded to "N + A.B - A - B - A.B".

- The list M (p, p', I, I'), where p and p' are state constructors names and I and I' are lists is written as if it was append (I, I').

I_ Maxima of sequence of Nat's. In this first example, values of type Nat are read in sequentially and they are considered as forming a series of sequences separated by O's. At the end of each sequence, the maximum of that sequence is sent out. This is achieved by a process SMAX constructed as a network of more elementary processes. The process MAX and the following definitions are used in that construction :

proc REG W :Nat out R : Nat states V : Nat vats r, s : Nat rules ==> V (0) v (r) • W (s) ==> V (s) v (r) • R (r) ==> V (0) endproc

type Signal cons buzz, ring • -) Signal endtype

proc BZZZ in K : Nat out L : Nat S : Signal st~t)es P : = vats p : Nat rules ==> P p . K (0) L (0) S (buzz) ==> P p - K (succ(p)) L (succ(p)) ==> P endproc 255

oroc GATE in M : Nat T :Signal out N : Nat states Q : - vats q : Nat rules ==> Q Q " M(q) N(q) T(buzz) ==> Q endproc

Then the process SMAX can be constructed by the following process form •

p r.oc SMAX is ( MAX II REG ++ C.W + R.B - B - R.B ) [I ( BZZZ II GATE ++ S.T ) ++ L.A ++ R.M

This construction can be pictured as foUows •

[ MAX

SMAX

The resulting basic process definition is remarkably short •

t)roc SMAX in K : Nat OUt N : Nat statep X_V__P_Q :Nat Y_V_P_Q : Nat × Nat Z_V_P_Q : Nat × Nat v,ars p, m, r : Nat rules ==> X_V_P_Q (0) x_V_p_Q (r) : K (succ(p)) ==> Y_V_P_Q (succ(p), r) Y-V--P-0 (m, r) : ==> Z_V_P_Q (max (m,r), 0) Z_V__P_0 (m, r) : ==> X_V_P_Q (m) X_V__p_Q (r) : K (0) N (r) ==> Y_V_P_Q (0, 0) en,dproc 256

2 _ Construction of a. queue.

In addition to process operators, process forms may also use conditionals. It is then possible to write recursive definitions, like the following construction of a bounded queue BQ built as a chain of processes of the indexed family V -

proc BQ [k "Nat] is ~_ k=l then V_I els____~e BQ[k- I ] I] V_k ++ R_(k- I ).W_k endif

The instantiation •

proc BQ4 is BQ[4] can be pictured as follows •

It is also possible to have an "iterative" description of BQ, using the repetition facility inside a process form "

t)roc BQ[k Nail is II ( V=_i I i= l..k} ++ ( R_i.W_(i+ I ) ] i= I ..k- I ) 257

;L.S~tolic arrays.

Let A and B be two nxn matrices. Given a series X0,...,Xi .... of vectors with n components, the problem is to compute a series ¥t,...,Yi,... of vectors with n components such that Yi = AXi + BXi-I This computation can be performed by a nxn systolic array of processes. Let SYSTOL be the name of that array. The complete system comprises SYSTOL and four interface processes which prepare the input vectors for SYSTOL and assemble the output vectors for the environment.

,;2 F ~>.:, ; 2 Z~; INX ., .,...~ ",i j~.IZ,' , 0 0 " i

~",)LOI..T'! f t 1...... l, il

The Xi vectors arrive into the right of the system and they get out unchanged from the left, while the computed results, the Yi vectors, leave from the bottom. The processes INX, OUTX, ZERO and OUTY are interface processes : INX inserts one vector of 0's after each Xi vector and delays the jth component of the i m vector of that new sequence so that it arrives into SYSTOL "at the same time" (i. e. within the same event) as the first component of vector (i+j-l) of that sequence. Symmetrically, OUTX and OUTY re-establish the synchrony among the components of each Xi vector and of each Yi vector respectively. ZERO repeatedly sends vectors of O's into the top of SYSTOL. The FP2 description of these interface processes is left to the reader. 258

Surrounded by its interfaces, the process SYSTOL is an n×n array of orthogonally connected processes of the family MOD, each containing 2 CELL's. MOD[i,j] is positioned at row i, column j of SYSTOL and it is defined by :

proc MOD [i,j:Nat] is CELL [b(i,j)][U,Xl i j,Y0 i j,V] II CELL [a(i,j)][X0_i_jZ,W,YI_Lj] ++ V.W ++ Z.U where a(i,j) and b(i,j) are elements of the matrices A and B respectively. Finally, SYSTOL is constructed as follows :

proc SYSTOL [n : Nat] is II { ROWIi,nl I i= l..n } ++{ Yl_i_j.Y0_(i+ 1)_j I i=1..n-1, j=l..n }

proc ROW [i,n : Natl its I1{ MODIi,jl I j=l..n } ++{ X1_L(j+1).X0_i_j I j=l..n-1 } 2~9

5 _ POLYMORPHISM,

Definitions of types, operations and processes can specify that the defined entities are parameterized by types • such entities are called "polymorphic".

Polymorphic definitions use formal type parameters which are introduced by the definition, with their names and with an algebraic characterization of the family of possible corresponding actual types.

In FP2, such an algebraic characterization is called a Drooertv : properties are defined by means of equations on terms, formal type parameters of polymorphic definitions require properties on their corresponding actual types and the satisfaction of a property by an actual type can be asserted by means of a specialized satisfaction clause.

5.1...... Polymorphic definitions without properties.

A polymorphic definition of a type, operation or process provides +

the name of the polymorphic type, operation or process ; the description of the formal type parameters ; the body of the definition, which is a basic type, operation or process definition or a type, functional or process form. The body can use the formal type parameters, by refering to their names in any context where a type can be written.

For example a polymorphic type for pairs of values of the same type is •

type Pair [ t" type ] l Ftype [t] is t x t

It reads as follows' "the type Pair [t], such that t is a type satisfying the property Ftype is the cartesian product t x t".

The property Ftype is a predefined property • all types satisfy it, which means that any type can be used for instantiating the polymorphic type Pair •

tyt)e Pairnat is Pair [ Nat ]

type Twopairs is Pair [ Pairnat ] 260

Binary trees with nodes labelled by values of a given type and leaves labelled by values of a possibly different type have the type :

type Tree [ t, u : type ] I Ftype [ t ], Ftype [ u ] con~ leaf : t =) Tree[t,u] node : uxTree[t,u]xTree[t,u] ~) Tree[t,u] endtvDe

Trees with pairs of Nat's on nodes and Nat's on leaves have the type :

tvoe Treenat ~ Tree [ Nat, Pairnat ]

Such a tree can be constructed by : node ( (3, 4), leaf (I), leaf (2))

But it is also possible to define :

Woe° - Treebool is Tree [ Bool, Bool ] and to construct : node ( false, leaf (true), leaf (true))

Thus, the polymorphic definition of Tree has also introduced operations "leaf" and "node", which are polymorphic. Instances of these operations have also been created : one instance of leaf takes a Nat and returns a Treenat, the other instance of leaf takes a Bo01 and returns a Treebool. Similarly, two instances of node have been created. The complete names of these functions are qualified by their signatures :

( leaf : Nat =) Treenat ) ( leaf : Bool -) Treebool) ( node : Pairnat x Treenat x Treenat -) Treenat ) ( node : Bool x Treebool x Treebool =) Treebool )

When constructing "node ( (3, 4), leaf (I), leaf (2))", the choice among the various instances of node is governed by the types of the arguments. In fact, this term stands for the more explicit construction :

( node : Pairnat x Treenat x Treenat =) Treenat ) ( (3, 4), ( leaf:Nat =) Treenat ) (I), ( leaf:Nat =) Treenat ) (I)) 261

Polymorphic operations can also be defined separately •

o_~ first [ t" type ] I Ftype [ t ]" Pair [ t ] -) vats x, y • t rules first(x,y) ==> x endoo

An instance of first could be explicitly created and called, like in •

( first "Pair [ Nat ] =) Nat ) (3, 4)

But it is also possible, as above, to omit the signature and to simply write

first (3, 4)

Finally, there are also polymorphic processes •

Droc PSTACK [ t "tyPe ] i Ftype [ t ] in I • t out 0 • t states S • t ~ vars e • t v • t* rules ==> S (nil) S(v) • I(e) ==> S (e.v) S(e.v) • 0(e) ==> S (v) endDroc

They can be instantiated •

proc STACKNAT is PSTACK [ Nat ] 262

5.2 ~ Property definitions and satisfac)ion claus.es,

In all the above examples, any actual type can be bound to the formal types, since the only requirement is that it satisfies the property Ftype. This is not always the case. For example, the definition of a generic equality operation on Pair's would require that there also exist an equality operation on the type t of the elements. The orooertv of such types with an equality operation can be defined in FP2, by means of a property definition :

prop Equality It with eq] opns eq " txt-) Bool vats x,y,z • t eqns eq(x,y) == eq (y,x) eq (x, x) == true eq (x, y) ^ eq (y, z) A ~ eq (X, Z) == false endoroo

It can be read as follows :"the property Equality is satisfied by all types like t with an operation like eq : t x t -) Bool iff the terms built with that operation obey the specified equations". (Two terms v and w obey the equation "v == w" iff the reductions of v and w terminate with the same term). Here, the equations state that eq is symmetric, reflexive and transitive.

If "=" is the name of the equality operation on Nat's, the type Nat should now satisfy the property Equality with the operation "=". However, orovin~ that it is indeed the case is, in general, not a feasable task. This is why FP2, for that purpose, relies on assertions in the form of satisfaction clauses :

sat Natequal is Equality[Nat with = ] which reads • "Natequal is the name of the satisfaction clause asserting that the property Equality is satisfied by the type Nat with its operation ='.

Then it becomes possible to define an equality operation on Pair's •

on same [ t : type ] [ equal : ~ ] I Equality [ t with equal ] : Pair[t]x Pair[t] -) Bool a, b,c, d : t rule~ same ( (a, b), (c, d) ) ==> equal (a, c) A equal (b, d) endop 263

Thus, "same" is a polymorphic operation with signature Pair[ t ] × Pair[ t ] -) Bool, requiring that the formal type t satisfy Equality with the formal operation equal. With that definition, a term like •

same ( (3, 4), (5, 6) ) binds t to Nat, since it means '

( same "Pair [ Nat ] × Pair [ Nat ] =) Bool ) ( (3, 4), (5, 6) )

Given that • Equality [ t with equal ] is required, and that • Equality [ Nat with = ] is satisfied,

this term is correct and the formal operation equal is bound to "=". Finally, the rule '

same ( (3, 4), (5, 6) ) ==> (3=5) ^ (4=6)

is applied and the term eventually reduces to false, as expected.

Given this equality operation "same" on Pair's, it becomes even possible to say that the type Pair [ t ] satisfies Equality with it, provided that t itself satisfy Equality. This is accomplished by a polymorphic satisfaction clause '

sat Pairequal [ t" typ~ ] [ eq 0_~ ] I Equality [ t with eq ] is Equality [ Pair [ t ] with same ]

That satisfaction clause enlarges the polymorphism of the operation same • it becomes applicable to Pair [ Nat ], Pair [ Pair [ Nat ]], Pair [ Pair [ Pair [ Nat ]]], etc. 264

The last example shows a polymorphic process BIGMAX [ t ], requiring that there be a semi-lattice structure among objects of type t • it inputs n objects of type t within one event and sends out their least upper bound.

prop Semilattice [ t with eq, leq, lub ] leq " txt -) Bool lub " t×t -) t vars m,n,p • t

leq (m, m) == true leq (m, n) A leq (n, m)A -, eq (m, n) == false leq (m, n) A leq (n, p) A " leq (m, p) == false lub (m, n) == lub (n, m) leq (m, lub (m, n) ) == true leq (m, p) ^ leq (n, p) A "leq (iub (m, n), p) == false ~ndoroo

proc BIGMAX[ t- type ] [ eq, le, up" o.o.p_] I Semilattice [ t with eq, le, up ] [ n :Nat ] in {I_ili=1..n} : t out 0 : t states E : - F : t* (v_i]i=l..n} : t s : t* rul,e,S ==> E E " {I_i(v_i)li=l..n} ==> F(Iv_ili=1..nl) F (s) - 0 (/(up) (s)) ==> E endproc

Given the satisfaction clause •

~_aAt Latnat is Semilattice [ Nat with =, (, max ]

BIGMAX can be instantiated to •

proc BMAXNAT [ n" Nat ] is BIGMAX [ Nat ] [ n ]

In that instantiation, the formal operations eq, le and up of BIGMAX get bound to =, ( and max respectively. 265

6 _ EXCEPTIONS.

In the definition of a function, it is often assumed that it applies to all possible values in its domain type. However, there are cases where the domain of definition should not cover all the domain type.

In FP2, it-is possible to take care of such situations, which correspond to the notion of partial functions

- Preconditions on parameters can restrict the domain of definition and raise exceptions.

Exceptions handlers provide means of defining the actions to be taken when an exception is raised.

6. I _ Preconditions

In addition to the "normal" rules which define the reductions of terms containing operation applications, an operation definition may also contain precondition

Normal rules have the format '

left term ==> right term where, in the left term, the outmost function name is the operation being defined and its subterms are either constructor applications or variables. Furthermore, no two normal rules in an operation definition have unifiable left terms.

Precondition rules have the format •

left term I condition ==> ! exception name

Here, the outmost function name of the left term may also be a constructor. The condition is a term reducing to a boolean value, where the variables also appear in the left term. No two precondition rules in an operation definition have unifiable left terms. The exception name is simply an identifier. 266

For example, accessing the i th element of a sequence, where i is a natural number, requires that i~be not smaller than 1 and not greater than the length of the sequence.

The following polymorphic operation "elem" has its domain of definition restricted accordingly, by means of a precondition rule •

o_12 elem [ t" type ] l Ftype [ t ]" t* x Nat =) vars s • t* e • t i • Nat rules elem ( s, i ) I i < I v i ) length (s) ==> ! out_of_range elem ( e.s, 1 ) ==> e elem ( e.s, succ (succ (i))) ==> elem ( s, succ (i)) endoo

Given the definition of an operation f possibly containing precondition rules, an application f (arg) is interpreted as follows •

I. If f(arg) matches the left term of a precondition rule, the corresponding condition is evaluated. If the result is true, the named exception is raised, where "raising an exception" means returning that exception as value. If the result is false, the precondition rule is ignored.

2. If f(arg) does not match the left term of a precondition rule, or if f(arg) matches the left term of a precondition rule but the condition was false, a normal rule is looked for with f(arg) matching its left term.

3. If f(arg) matches the left term of a normal rule, that rule is applied.

. If f(arg) does not match the left term of a normal rule, the predefined exception '! axiomatization" is raised. This means that the definition of f is not complete.

As a consequence of this general mechanism for interpreting function applications, every FP2 function f : Targ -) Tres may be viewed as a function f : Targ -) Tres | Exception where "Exception" is a predefined type : all Exception "values" are built by the constructor ! which take an identifier as its parameter. 267

Precondition rules can also be used to restrict the domain of constructors. For example, the type of rational numbers could be defined as follows •

Rat cons // • Nat × Nat -) Ra~ vars m, n • Nat, rules m I/n ~ n ,~ 0 ==> ! zero_divide endtype

In that case, since there are no "normal rules" for rewriting constructor applications, an application p//q where p and q are in normal form either raises ! zero_divide or stays as it is.

6.2 _ Exception handlers.

Exceptions can only be raised by the reduction of terms. This occurs when a precondition rule is applicable and its condition is true. It is also possible to explicitly raise an exception in the right term of a normal rule, like in •

f (arg) ==> if p (arg) then g (arg) e!s¢ ! e .endif

Raising an exception means returning it as value. As a consequence, a subterm x of a term f ( ... x ... ) may turn out to produce an exception value ! e : all functions in FP2 are strict with respect to exceptions, which means that f(... !e _.) also has the value ! e. However, it is possible to catch an exception on its way out of a term, by means of an exception handler.

There are two situations where exception handlers, may catch exceptions •

When the evaluation of the right term of a normal operation rule produces an exception value ; in that case, an exception handler can be attached to the corresponding rule in the definition of the operation.

When a functional term inside the post-condition of a process rule produces an exception : in that case, an exception handler can be attached to the state constructor of the current state in the definition of the process. 268

6.2.1 _ Exception handling in

The general format of a normal operation rule with exception handlers attached is-

left term ==> right term when !e I then fl when l e z the____~n f2

when !e n the____n_n fn endwhen where ! e i is an exception name and fi is a term written with the same conventions as for a right term.

When ! e i is obtained as the value of the right term, then fi is taken as a "replacement" right term and evaluated. Of course, the evaluation of fi may in turn raise an exception e' i which may be handled in its due place by the rule (in general another rule) getting it as its right term value, etc.

For example, let Seqnat be the type of infinite sequences of natural numbers where only a finite slice of elements indexed from I may have a non zero value •

type Seqnat cons infseq : Nat* "~ Seqnat o pns access : Seqnat x Nat =) Nat yars s : Nat* i : Nat rules access ( infseq (s), i ) ==> elem (s, i) when ! out_of_range then 0 endwhen endtvDe 269

~ption handling in processes.

In addition to the "normal" rules which define state transitions, a process definition may also contain exce~ recovery rules The general format of an exception recovery rule is -

whe____nn l e Ln Q ==> s where l e is an exception name, Q is a state constructor name and s is a state. When a normal rule of the form •

P(f) • event --=> Q(g) is being applied, the evaluation of g may raise exception ! e. If ! e is not caught by an exception handler of an operation rule before reaching the outer layer of g, then the process is said to be in the exceptional state "! e in Q".

If there is no exception recovery rule corresponding to that exceptional state, the process is terminated. If there is one, the process "recovers" by going into state s. In that case, the application of the normal rule "recovered" by the recovery rule is considered as one transition.

For example, a process receiving two natural integers p and q and sending out p//q may have to deal with the exception ! zero_divide •

proc NATRAT in M, N ' Nat out R • Rat states E • F • Rat vats p,q " Nat r • Rat rules ==> E E • M (p) N (q) ==> F(pHq) F(r) • R (r) ==) E when !zero_divide i_n_n F ==) E endproc

It must also be noted that exceptions can be produced by the evaluation of sent messages" since no term may be unified with an exception value, the consequence of that situation is that the corresponding rule is not applicable. 270

7 _ MODULES.

FP2 allows the definition of a variety of entities -

- Types - Operations

- Processes - Properties - Satisfactions

The purpose of a definition is to associate a name with an entity. The name-entity associations established by a set of definitions are in effect within a region of FP2 text called a module.

In addition to the above entities, it is also possible to define modules within modules : this is a means of structuring FP2 programs into a hierarchy of modules. This hierarchy of modules is used as a basis for controling the extent of the region of FP2 text accross which every definition is in effect.

The basic format of a module definition is "

module M is endmodule where M is the name of the module and the module body is a set of definitions. With modules defined within modules, the basic visibility rides are the same as for classical block structure - all definitions of a module are visible from inner modules, except for redefined names. In addition to that "from inside-out" visibility, a module may exoort some of its definitions up to its directly enclosing module • such exported definitions are then considered as if they were made in the enclosing module. Thus, the exporting facility brings a controlled "from outside-in" visibility. 27~

For example, the definition of the operation repmax uses an auxiliary function rep. For defining repmax in a module M while keeping rep hidden, it is possible to write '

module M is type Btree cons

yafs rules endtype

module B e_xxoj~ repmax is

o_p= repmax : Btree -) Btree vars t : Btree rules repmax(t) ==> rep(t, max(t)) endop_

o_p_ rep : Btree × Nat -) Btree m, n : Nat u, v : Btree rep(tip(m),n) ==> tip(n) rep(fork(u,v),n) ==> fork(rep(u,n),rep(v,n))

endmodule

endmodule 272

But, it is also possible to exercise a control over the basic from "inside-out" visibility by explicitly stating what names, which are visible in a module M, become hidden from some of its inner modules •

module M ~ E x is

without D I, ..., Dn module N export E I, ..., Ep is

endmodule

module P is

endmodule

without E l module Q is

endmodule

endmodule

Here, the definition of module N is made at a point where names D i ..... D, which are known in M, become invisible from within N.

By combining export and without facilities, FP2 allows a very flexible control over the visibility of definitions. For example, the name E l , which is exported by N, is visible in N, in M and also in the enclosing module of M. It is also visible in P, but it is not visible in Q. 273

AK_.__NOWLEDGEMENTS

The design, the formal definition and the implementation of FP2, both as a programming language and as a specification language, are carried out by a research group at LIFIA. The current (temporary ?) status of the language is the result of numerous discussions among the members of this group: Philippe Schnoebelen, Sylvie Roge, Juan-Manuel Pereira, Jean-Charles Marty, Anrd.ck Marty, Philippe Jorrand, Maria-DIanca Ibanez and Jean-Michel Hufflen. The principles for polymorphism in FP2 are drawn from the work accomplished kn another research group at LIFIA, led by Didier Bert, on the design and implementation of LPG "Langage de Programmation G~nerique".

The work on FP2 has also benefited from the support of the French Project C3 ("Concurrence, Communication, Cooperation") of CNRS and from the support of Nixdorf Computer A.G. in Paderborn, FRG, within ESPRIT Project 415 ("Parallel Languages and Architectures for Advanced Information Processing. A VLSI Approach").

BIBLIOGRAPHY

The work on FP2 has heavily relied upon the current state of the art in language design. Much inspiration has come from recently proposed functional languages and from a variety of models for parallelism and communicating processes. A collection of such important sources is listed in the following pages. Past experience of LIFIA in language design has also been of some help and the corresponding reports are inserted in the list. 274

ARKAXHIU, E. "Un environnement et un langage graphique pour la specification de processus parall~les communicants." These, LIFIA, Grenoble, 1984.

AUSTRY, D. "Aspects syntaxiques de MEIJE, un calcul pour le paralIelisme. Applications." These, LITP, Paris, 1984.

AUSTRY, D. and BOUDOL, G. "Algebre de processus et synchronisation." Theoretical Computer Science, 1984.

BACKUS, J. W. "can Programming Be Liberated From The Von Neumann Style ? A functional style and its algebra of programs." Communications of the ACM. Vol. 2 I, no. 8, ! 978.

BACKUS, J. w. "The algebra of functional programs : function level reasoning, linear equations and extended definitions." Lecture Notes in Computer Science no. 107, 198 I.

BACKUS, J. W. "Function Level Programs as Mathematical Objects." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I.

BERT, D. "Specification algebrique et axiomatique des exceptions." RR IMAG 183, LIFIA, Grenoble, 1980.

BERT, D. "Refinements of Generic Specifications with Algebraic Tools." IFIP Congress, North Holland, 1983.

BERT, D. "Generic Programming : a tool for designing universal operators." RR IMAG 336, LIFIA, Grenoble, 1982.

BERT, D. "Manuel de r~ference de LPG, Version 1.2." RR IMAG 408, LIFIA, Grenoble, 1983.

BERT, D. and BENSALEM, S. "Algebre des operateurs generiques et transformation de programmes en LPG." RR IMAG 488 (LIFIA 14), Grenoble, 1984. 275

BERT, D. and JACQUET, P. "Some validation problems with parameterized types and generic functions." 3 r~ International Symposium on Programming, Dunod, Paris, 1978.

BIDOIT, M. "Une methode de presentation des types abstraits : applications." These, LRI, Orsay, 198 I.

BJORNER, D. and JONES, C. B. "The Vienna Development Method : The Meta-Language." Lecture Notes in Computer Science no. 6 I, 1978.

BJORNER, D. and JONES, C. B. "Formal specification & software development." Prentice Hall International, Englewood Cliffs, New Jersey 1982.

BOUDOL, G. "Computational semantics of terms rewriting systems." RR 192, INRIA, 1983.

BROOKES, S. D. "A model for communicating sequential processes." Thesis, Carnegie-Mellon University, ! 983.

BURSTALL, R. M., MACQUEEN, D.B. and SANNELLA, D.T. "HOPE: an experimental applicative language." CSR-62-80, University of Edinburgh, 198 I.

CISNEROS, M. "Programmation parallele et programmation fonctionnelle : propositions pour un langage." These, LIFIA, Grenoble, 1984.

DERSHOWITZ, N. "Computing with rewrite systems." ATR-83 (8478)-I, Aerospace Corporation, 1983.

GOGUEN, J. A., THATCHER,J. W. and WAGNER, E. G. "An initial algebra approach to the specification, correctness, and implementation of abstract data types." Current Trends in Programming Methodology, Vol. 4, Prentice Hall, Englewood Cliffs, New Jersey, 1978.

GUERREIRO, P. J. V. D. "Semantique relationnelle des programmes non-deterministes et des processus communicants." Th~se, IMAG, Grenoble, juillet 198 I.

GUTTAG, J. V. and HORNING, J.J. "The algebraic specification of abstract data types." Acta Informatica, 1978. 276

HOARE, C. A. R. "Communicating sequential processes." Communications of the ACM, Vol. 2 I, no. 8, 1978.

HOARE, C. A.R. "Notes on communicating processes." PRG-33, Oxford University, 1983.

HUFFLEN, J. M. "Notes sur FP et son implantation en LPG." RR IMAG 518 (LIFIA 20), Grenoble, 1985.

JORRAND, Ph. "Specification of communicating processes and process implementation correctness." Lecture Notes in Computer Science no. 137, 1982.

JORRAND, Ph. "FP2 :"Functional Parallel Programming based on term substitution." RR IMAG 482 (LIFIA 15), Grenoble, 1984.

MAY, D. "OCCAM." SIGPLAN Notices, Vol. 13, no. 4, 1983.

MILNER, R. "A calculus of communicating systems." Lecture Notes in Computer Science, no. 92, 1980.

PEREIRA, J. M. "Processus communicants : un langage formel et ses mod(~les. Probl~mes d'analyse." Th~se, LIFIA, Grenoble, 1984.

SOLER, R. "Une approche de la th(~orie de D. Scott et application ~ la semantique des types abstraits alg~briques." Th(~se, LIFIA, Grenoble, septembre 1982.

TURNER, D. A. "The semantic elegance of applicative languages." Conference on Functional Programming Languages & Computer Architecture, ACM, 198 I.

WILLIAMS J. H. "On the development of the algebra of functional programs." ACM Transactions on Programming Languages and Systems, Vol. 4, no. 4, 1982. Concurrent Pro]og: A Progress Report

Ehud Shapiro

Department. of Computer Science The Weizmann Institute of Science Rehovot 76100, Israel

April 1986

Abstract

Concurrent Prolog is a logic programming language designed for concurrent programming and parallel execution. It is a process oriented language, which embodies dataflow synchronization arm guarded-command indeterminacy as its basic control mechanisms. The paper outlines the basic concepts and definition of the language, and surveys the major programming techniques that emerged out of three years of its use. The history of the language development, implementation, and applications to date is reviewed. Details of the performance of its compiler and the functionality of Logix, its programming environment and operating system, are provided.

1. Orientation

Logic programming is based on an abstract computation model, derived by Kowalski [28] from Robinson's resolution principle [40]. A logic program is a set of axioms defining relationships between objects. A computation of a logic program is a proof of a goal statement from the axioms. As the proof is constructive, it provides values for goal variables, which constitute the output of the computation. Figure 1.1 shows the relationships between the abstract computation model of logic programming, and two concrete programming languages based on it: Prolog, designed by A. Colmerauer [41] and Concurrent Prolog. It shows that Prolog pro- grams are logic programs augmented with a control mechanism based on sequen- tial search with backtracking; Concurrent Prolog's control is based on guarded- command indeterminacy and dataflow synchronization. The execution model of Prolog is implemented using a stack of goals, which behave like procedure calls. Concurrent Prolog's computation model is implemented using a queue of goals, 278

Logic Programs Abstract model: Nondeterministic goal reduction Unification

Language: Prolog Concurrent Prolog

Commit and read-only Goal and clause order operators define Control: define sequential search guarded-command and backtracking indeterminacy and dataflow synchronization

Implementation: stack of goals queue of goals + trail for backtracking + suspension mechanism

Figure 1.1: Logic programs, Prolog, and Concurrent Prolog

which behave like processes. Figure 1.2 argues that there is a homomorphism between von Neumann and logic, sequential and concurrent languages. That is, it claims that the relationship between Occam and Concurrent Prolog is similar to the relationship between Pas- cal and Prolog, and that the relationship between Pascal and Occam is similar to the relationship between Prolog and Concurrent Prolog 1 .

2. Logic Programs

A logic program is a set of axioms, or rules, defining relationships between objects. A computation of a logic program is a deduction of consequences of the axioms. The concepts of logic programming and the definition and implementation

Some of the attributes in the figure are rather schematic, and shouldn't be taken literally, e.g. Pascal has recursion, but its basic repetitive construct, as in Occam, is iteration, whereas in Frolog and Concurrent Prolog it is r~ursion. Similarly Occam has if-then-else, but its basic conditional statement, as in Concurrent Prolog, is the guarded-command. 279

Pascal Prolog sequential sfiack-based procedure call parameter passing if-then-else/cut

Occam Concurrent Prolog concurrent queue-based process activation message passing guarded-command/commit

yon Neumann model logic programs model storage variables logical variables (mutable) (single assignment) parameter-passing, unification assignment, selectors, constructors explicit/static allocation implicit/dynamic allocation of data/processes of data/processes with garbage collection iteration recursion

Figure 1.2: A homomorphism between von Neumann and logic, sequential and concurrent languages

of the programming language Prolog date back to the early seventies. Earlier attempts were made to use Robinson's resolution principle and unification algo- rithm [40] as the engine of a logic based computation model [16]. These attempts were frustrated by the inherent inefficiency of general resolution and by the lack of a natural control mechanism which could be applied to it. Kowalski [28] has found that such a control mechanism can be applied to a restricted class of logi- cal theories, namely Horn clause theories. His major insight was that universally quantified axioms of the form

A +-- B1,B2,...,Bn n >_ 0 can be read both declaratively, saying that A is true if B1 and B2 and ... and Bn are 280

intersect(X,L1,L2) ~-- member(X,L1), member(X,L2). member(X,list (X,Xs)). member(X,list(Y,Ys)) ~-- member(X,Ys).

Program 2.1: A logic program for List intersection

true, and procedurally, saying that to prove the goal A (execute procedure A, solve problem A), one can prove subgoals (execute subprocedures, solve subproblems) B1 and B2 and ... and Bn. Such axioms are called definite-clauses. A logic program is a finite set of definite clauses. Program 2.1 is an example of a logic program for defining list intersection. It assumes that lists such as [1,2,3] are represented by recursive terms such as list(1,1ist( e, list( S, nil) ) ). Declaratively, its first axiom reads: X is in the intersection of lists L1 and L2 if X is a member of L1 mad X is a member of/;2. Procedurally, it reads: to find an X in the intersection of L1 and/;2 find an X which is a member of L1 and is also a member of L2. The axioms defining member read declaratively: X is a member of the list whose first element is X. X is a member of the list list( Y, Ys) if X is a member of Ys. (Here and in the following we use the convention that names of logical variable begin with an upper-case letter.) The difference between the various logic programming languages, such as se- quential Prolog [41], PARLOG [7], Guarded Horn Clauses [65], and Concurrent Prolog [49], lie in the way they deduce consequences from such axioms. However, the deduction mechanism used by all these languages is based on the abstract in- terpreter for logic programs, shown in Figure 2.1. The notions it uses are explained below. On the face of it, the abstract interpreter seems nothing but a simple nonde- terministic reduction engine: it has a resolvent, which is a set of goals to reduce; it selects a goal from the resolvent, a unifiable clause from the program, and reduces the goal using the clause. What distinguishes this computation model from others is the logical variable, and the unification procedure associated with it. The basic computation step of the interpreter, as well as that of Prolog and Concurrent Prolog, is the unification of a goal with the head of a clause [40]. The unification of two terms involves finding a substitution of values for vari- ables in the terms that make the two terms identical. Thus unification is a simple and powerful form of pattern matching. 281

Input: A logic program P and a goal G Output: GO, which is an instance of G proved from P, or failure. Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goat A in the resolvent and a fresh copy of a clause A ~ *- B1,B2,...,B~, k > O, in P, such that A and A I are unifiable with a substitution 0 (exit if such a goal and clause do not exist). Remove A frora, and add B1,B2,...,B,~ to, the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output failure.

Figure 2.1: An abstract interpreter for logic programs

Unification is the basic, and only, data manipulation primitive in logic pro- gramming. Understanding logic programming is understanding the power of unifi- cation. As the example programs below show, unification subsumes the following data-manipulation primitives, used in conventional programming languages:

• Single-assignment (assigning a value to a single-assignment variable). ® Parameter passing (binding actual parameters to formal parameters in a procedure or function call). • Simple testing (testing whether a variable equals some value, or if the values of two variables are the same). . Data access (field selectors in Pascal, ear and edr in Lisp). • Data construction (new in Pascal, cons in Lisp). • Communication (as elaborated below). The efficient implementation of a logic programming language involves the compilation of the known part of unification, as specified by the program's clause heads to the above mentioned set of more primitive operations [72]. A term is either a variable, e.g. X, a constant, e.g. a and 18, or a compound term f(T1,T~,...,T,~), whose main functor has name f, arity n, and whose argu- 282 ments T1,T2,...,Tn, are terms. A substitution element is a pair of the form Variable=Term. An (idempotent) substitution is a finite set of substitution elements (VI=T1, V2=T2,..., V,,=T,~) such that Vi¢ V1 if i ~ j, and Vi does not occur in Ti for any i and 3". The application of a substitution 0 to a term S, denoted $0, is the term obtained by replacing every occurrence of a variable V by the term T~ for every substitution element V----Tin 0. Such a term is called an instance of S. For example, applying the substitution (X---3, Xs--list(1,1ist(3,nil))} to the term mcmber( X, list( X, Xs) ) is the term member( 3,1ist( 3,1ist(1,1ist( 3,nil) ) ) ). A substitution 0 unifies terms T1 and T2 if T10=T20. Two terms are unifiable if they have a unifying substitution. If two terms T1 and T9 are unifiable then there exists a unique substitution/9 (up to renaming of variables), called the most general unifier of T1 and T2, with the following property: for any other unifying substitution a of T1 and T~, Tla is an instance of T10. In the following we use 'unifier' as a shorthand for 'most general unifier'. For example, the unifier of X and a is (X=a). The unifier of X and Y is (X=Y) (or (Y=X)). The unifier off(X,X) and f(A,b) is (X=b, A--b), and the unifier of g(X,X) and g(a,b) does not exist. Considering the example logic pro- gram above, the unifier of member(A,list(l,list($,nil))) and member(X, list(X, Xs)) is {X=I,A=I,Xs=tist( ,nil) }.

3. Concurrent Prolog

We first survey some common concepts of concurrent programming, tie them to logic programming, and then introduce Concurrent Prolog.

3.1 Concurrent programming: processes, communication, and synchronization

A concurrent programming language can express concurrent activities, or pro- cesses, and communication among them. Processes are abstract entities; they are the generalization of the execution thread of sequential programs. The actions a process can take include inter-process communication, change of state, creation of new processes, and termination. It might seem that a declarative language, based on the logic programming computation model, will be unsuitable for expressing the wide spectrum of actions of concurrent programs. This is not the case. Sequential Prolog shows that, in addition to its declarative reading, a logic program can be read procedurally. 283

al) Goal = Process a2) Conjunctive goal = Network of processes a3) Shared logical variable = Communication channel = Shared-memory single-assignment variable a4) Clauses of a logic program = Rules, or instructions, for process behavior

Figure 3.1: Concepts of logic programming and concurrency

Concurrent Prolog shows yet another possible reading of logic programs, namely the process behavior reading, or process reading for short. The insight we would like to convey is that the essential components of concurrent computations Q concurrent actions, indeterminate actions, communication, and process creation and termination -- are already embodied in the abstract computation model of logic programming, and that they can be uncovered using the process reading. Before introducing the computation model of Concurrent Prolog that embod- ies these notions, we would like to dwell on the intuitions and metaphors that link the formal, symbolic, computational model with the familiar concepts of concur- rent programming, via a sequence of analogies, shown in Figure 3.1. We exemplify them using the Concurrent Prolog program for quicksort, Pro- gram 3.1. In the meantime the read-only operator '?' can be ignored, and the commit operator '1' can be read as a conjunction ','. Following Edinburgh Prolog, the term [XIXs ] is a syntactic convention replac- ing list(X, Xs), and [] replaces nil. The list [1,ZlXs ] is a shorthand for [l[[21Xs]] , that is list(g,Iist(2,Xs)), and [1,2,31 for list(1,1ist(2,1ist(3,nil))). The clauses for quicksort read: Sorting the list [XIXs] gives Ys if partitioning Xs with respect to X gives Smaller and Larger, sorting Larger gives Ls, sorting Smaller gives Ss, and appending [X]Ss] to Ls gives Ys. Sorting the empty list gives the empty list. The first clause of partition reads: partitioning a list [XIIn ] with respect to X gives [ YISmatler] and Larger if X _> Y and partitioning In with respect to X gives Smaller and Larger. al) Goal = Process A goal p(T1,T2,...,Tn) can be viewed as a process. The arguments of the goal (TI,T2,..., Tn) constitute the data state of the process. The predicate, p/n (name p, arity n), is the program state, which determines the procedure (set of 284

quicksort ([XIXs],Ys) +- partition(Xs?,X,Smaller,Larger), quicksor t (Smaller?,Ss), quicksort(Larger?,Ls), append(Ss?,[XlLs?],Ys). quicksort([ ],[ ]). partition([YlIn],X,[Y[Smaller ],Larger) +- X >_ Y [ partition(In?,X,Smaller,Larger). partition([YlIn ],X,Smaller,[vlLarger]) +- X < Y [ partition(In?,X,SmalIer,Larger). partition([ ],X,[ ],[ ]). append([XlX],Ys,iXlZs]) append(Xs?,Ys,Zs). append([ ],Xs,Xs).

Program 3.1: A Concurrent Prolog Quicksort program

clauses with same predicate name and arity) executed by the process. A typical state of a quicksort process might be qsort([5,SS,$,7,191Xs ], Ys). a2) Conjunctive goal = Network of processes A network of processes is defined by its constituent processes, and by the way they are interconnected. A conjunctive goal is a set of processes. For example, the body of the recursive clause of quieksort defines a network of four proceses, one partition process, two quicksort processes, and one append process. The variables shared between the goals in the conjunction determine an illterconnection scheme. This leads to a third analogy. a3) Shared logical variable -= Communication channel -= Shared single-assignment variable A communication channel provides a means by which two or more processes may communicate information. A shared variable is another means for several processes to share or communicate information. A logical variable, shared between two or more goals (processes), can serve both these functions. For example, the variables Smaller and Larger serve as communication channels between partition and the two recursive quicksort processes. Logical variables are single-assignment, since a logical variable can be as- signed only once during a computation. Hence, a logical variable is analogous to a communication channel capable of transmitting only one message, or to a shared-memory variable that can receive only one value. 285

Note that under this singie-assignment restriction the distinction between a communication channel and a shared-memory variable vanishes. It is convenient to view shared logical variables sometimes as analogous to communication channels and sometimes as analogous to shared-memory variables. The single-assignment restriction has been proposed as suitable for parallel programming languages independently of Ioglc-programming [i]. At first sight it would seem a hindrance to the expressiveness of Concurrent Prolog, but it is not. Multiple communications and cooperative construction of a complex data structure are possible by starting with a single shared logical variable, as explained below. a4) Clauses of a logic program -- Rules, or instructions, for process behavior The actions of a process can be separated into control actions and data ac- tions. Control actions include termination, iteration, branching~ and creation of new processes. These are specified explicitly by logic program clauses. Data actions include communication and various operations on data structures, e.g. single-assignment, inspection, testing, and construction. As in sequential Prolog, data actions are specified implicitly by the arguments of the head and body goals of a clause, and are realized via unification.

3.2 The process reading of logic programs

We show how termination, iteration, branching, state-change, and creation of new processes can be specified by clauses, using the process reading of logic programs. 1) Terminate A unit clause, i.e. a definite clause with an empty body: p(T1, T2,..., Tn).

specifies that a process in a state unifiable with p(T1,Ts,°..,T,~) can reduce itself to the empty set of processes~ and thus terminate. For example the clause quicksort([ ],[ ]) says that any process which unifies with it, e.g. quicksort([ ], Ys), may terminate. While doing soy this process unifies Ys with [ ], effectively closing its output stream. 2) Change of data and program state An iterative clause, i.e. a clause with one goal in the body: p( T1, T2,..., Tn) +-- q(S1,S2,...,S,~).

specifies that a process in a state unifiable with p( T1, T2,..., Try) can change its state to q(S1,S2,...,Sm). The program state is changed to q/m (i.e. branch), 286

and the data state to (St~S2,.o .,Sin). For example, the recursive clause of append specifies that the process ap- pend([1,3,~,7,1Z]il],[21,2P,25ILZ],L3 ) can change its state to append([3,4,7, 12]L1],[P1,PP, B5]LP],Zs). While doing so, it unifies L• with [1 IZs], effectively sending an element down its output stream. Since append branches back to itself, it is actually an iterative process. 3) Create new processes A general clause, of the form: p(T1,T2,...,T,~) ~ Q1,Q2,...,Qm. specifies that a process in a state unifiable with p( TI, T2,..., T,t) can replace itself with m new processes as specified by Q1,Q2,...,Qm. For example, the recursive clause of quicksort says that a quicksor~ process whose first argument is a list can replace itself with a network of four processes: one partition process, two quieksort processes, and one append process. It further specifies their interconnection, and initializes the first element in the list forming the second argument of append to be X, the partitioning element. Note that under this reading an iterative clause can be viewed as specifying that a process can be replaced by another process, rather then change its state. These two views are equivalent. Recall the abstract interpreter in Figure 3.1. Under the process reading the resolvent, i.e. the current set of goals of the interpreter, is viewed as a network of concurrent processes, where each goal is a process. The basic action a process can take is process reduction: the unification of the process with the head of a clause, and its reduction to (or replacement by) the processes specified by the body of the clause. The actions a process can take depend on its state -- on whether its arguments unify with the arguments of the head of a given clause. Concurrency can be achieved by reducing several processes in parallel. This form of parallelism is called And-parallelism. Communication is achieved by the assignment of values to shared variables, caused by the unification that occurs during process reduction. Given a process to reduce, all clauses applicable for its reduction may be tried in parallel. This form of parallelism is called Or-parallelism, and is the source of a process's ability to take indeterminate actions.

3.3 Synchronization using the read-only and commit operators

In contrast to sequential Prolog, in Concurrent ProIog art action taken by a process cannot be undone: once a process has reduced itself using some clause, it is :287 committed to it. The resulting computational behavior is called committed-choice nondeterminism, don't-care nondeterminism, and sometimes also indeterminacy, to distinguish it from the "don't-know" nondeterminism of the abstract interpreter. This design decision is common to other concurrent logic programming lan- guages, including the original Relational Language [6], PARLOG [7], and GHC [54]. It implies that a process faced with a choice should better make a correct one, lest it might doom the entire computation to failure. The basic strategy taken by Concurrent Prolog to ensure that processes make correct choices of actions is to provide the programmer with a mechanism to delay process reductions until enough information is available so that a correct choice can be made. The two synchronization and control constructs of Concurrent Prolog are the read-only and the commit operators. The read-only operator (indicated by a question-mark suffix '?'), can be applied to logical variables, e.g. X?, thus desig- nating them as read-only. The read-only operator is ignored in the declarative reading of a clause, and can be understood only operationally. Intuitively, a read-only variable cannot be written upon, i.e. be instantiated. It can receive a value only through the instantiation of its corresponding write- enabled variable. A unification that attempts to instantiate a read-only variable suspends until that variable becomes instantiated. For example, the unification of X? with a suspends; of f(X, Y?) with/(a,g) succeeds, with unifier {X=a, Z=Y?}. Considering Program 3.1, the unification of quieksort(In?,Out) with both quieksort([ ],[ ]) and quieksort([X[Xs], Ys)suspends, as does the unification of append(Li?,[3,4,hlLZ],L3 ) with the heads of its two clauses. However, as soon as In? gets instantiated to [81Ini], for example, by another partition process who has a write-enabled occurrence of In, the unification of the quieksort goal with the head of the first clause fails, and with the second clause succeeds.

Definition: We assume two distinct sets of variables, write-enabled variables and read-only variables. The read-only operator, ?, is a one-to-one mapping from write-enabled to read-only variables. It is written in postfix notation. For every write-enabled variable X, the variable X? is the read-only variable corresponding to X. |

The extension of the read-only operator to terms which are not write-enabled variables is the identity function.

Definition: A substitution 0 affects a variable X if it contains a substitution element X=T. A substitution 0 is admissible if it does not affect any read-only variable. | 288

Definition: The read-only extension of a substitution 8, denoted 0?, is the result of adding to 0 the substitution elements X?=T? for every X=T in 0 such that T ~ X?. |

Definition: The read-only unification of two terms T1 and T2 succeeds, with read-only mgu 0?, if T1 and T2 have an admissible mgu 8. It suspends if every mgu of TI and T2 is not admissible. It fails if T1 and T2 do not unify. |

Note that the definition of unifiability prevents the unification attempt to instantiate read-only variables. However, once the unification is successful, the read-only unifier instantiates read-only variables in accordance with their corre- sponding write-enabled variables. This definition of read-only unification resolves several ill-defined points in the original description of Concurrent Prolog [49], discussed by Saraswat [42] and Ueda [65], such as order-dependency. It implicitly embodies the suggestion of Ramakrishnan and Silberschatz [39], that a single unification should not be able to "feed itself', that is simultaneously write on a write-enabled variable and read from its corresponding read-only variable. In particular, it implies that the unification of f(X,X?) with f(a,a) suspends. The second synchronization and control construct of Concurrent Prolog is the commit operator. A guarded clause is a clause of the form: A ,,- G1,G2,...,Gin t B1,B2,...,B,~ m,n >_ O.

The commit operator 'l' separates the right hand side of a rule into a guard and a body. Declaratively, the commit operator is read just like a conjunction: A is true if the G's and the B's are true. Procedurally, the reduction of a process A1 using such a clause suspends until A1 is unifiable with A, and the guard is determined to be true. Thus the guard is another mechanism for preventing or postponing erroneous process actions. As a syntactic convention, if the guard is empty, i.e. re=O, the commit operator is omitted. The read-only variables in the recursive invocations of quicksort, partition, and append cause them to suspend until it is known whether the input is a list or nil. The non-empty guard in the recursive clauses for partition allows the process to choose correctly on which output stream to place its next input element. It is placed on the first stream if it is smaller or equal to the partitioning element. It is placed on the second stream if it is larger then the partitioning element. Concurrent Prolog allows G's, the goals in the guard, to be calls to general Concurrent Prolog programs. Hence guards can be nested recursively, and testing the applicability of a clause for reduction can be arbitrarily complex. In the following discussion we will restrict our attention to a subset of Concurrent Prolog 289

,:ailed Flat Concurrent Prolog [33 I. In Flat Concurrent Prolog the goals in the guards can contain calls to a fixed set of simple test-predicates only. For example, Program 3.1 is a Flat Concurrent Prolog program. In Flat Concurrent Prolog, the reduction of a goal using a guarded clause succeeds if the goal unifies with the clauses' head, and its guard test predicates succeed. Flat Concurrent Prolog is both the target language and the implemen- tation language for the Logix system, to be discussed in Section 5. It is a rich enough subset of Concurrent Prolog to be sufficient for most practical purposes. It is simple enough to be amenable to an efficient implementation, resulting in a high-level concurrent programming language which is practical even on conven- tional uniprocessors.

3.4 An abstract interpreter for Flat Concurrent Prolog

Flat Concurrent Prolog is provided with a fixed set T of test predicates. Typ- ical test predicates include string(X) (which suspends until X is a non-variable, then succeeds if it is a string; fails otherwise), and X < Y (which suspends until X and Y are non-variables, then succeeds if they are integers such that X < Y, else fails).

Definition: A fiat guarded clause is a guarded clause of the form A +-- G1,G2,...,Gin [ B1,B2,...,Bn m,n >_ O. such that the predicate of Gi is in T, for all i, 0 < i < m.

A Flat Concurrent Prolog program is a finite set of fiat guarded clauses. | An abstract interpreter of Flat Concurrent ProIog is defined in Figure 3.2. The interpreter again leaves the nondeterministic choices for a goal and a clause unspecified: the scheduling policy, by which goals are added to and removed from the resoIvent, and the clause selection policy, which indicates which clause to choose for reduction, when several clauses are applicable. Fairness in the scheduling and clause selection policies are further discussed in [44]. For concreteness, we will explain the choices made in Logix. Logix im- plements bounded depth-first scheduling. In bounded depth-first scheduling the resolvent is maintained as a queue~ and each dequeued goal is allocated a time- slice t. A dequeued goal can be reduced t times before it is returned back to the queue. If a goal is reduced using an iterative clause A ~-- B, then B inherits the re- maining time-slice. If it is reduced using a general Clause A ~-- Bx,B2,...,Bn, then, by convention, B1 inherits the remaining time-slice, and B2 to Bn are enqueued to the back of the queue. Bounded depth-first scheduling reduces the overhead 290

Input: A Flat Concurrent Prolog program P and a goal G Output: GO, if GO was an instance of G proved from P or deadlock otherwise. Algorithm: Initialize the resolvent to be G, the input goal. While the resolvent is not empty do choose a goal A in the resolvent and a fresh copy of a clause A' *-- Gx,G2,...,Gin ] B1,B2,...,B,~ in P such that A and A' have a read-only unifier 0 and the tests (G1,G2,...,Gm)O succeed (exit if such a goal and clause do not exist). Remove A from and add B1,B2,...,B,~ to the resolvent Apply 0 to the resolvent and to G. If the resolvent is empty then output G, else output deadlock.

Figure 3.~: An abstract interpreter for Flat Concurrent Prolog

of process switching, and allows more effective cashing of process arguments in registers. Logix also implements stable clause selection, which means that if a process has several applicable clauses for reduction, the first one (textually) will be chosen. Stability is a property that can be abused by programmers. It is hard to preserve in a distributed implementation [44], and makes the life of optimizing compilers harder. It is not part of the language definition. In addition Logix implements a non-busy waiting mechanism, in which a suspended process is associated with the set of read-only variables which caused the suspension of its clause reductions. If any of the variables in that suspension set gets instantiated, the process is activated, and enqueued to the back of the queue. The abstract interpreter models concurrency by interleaving. The truly paral- lel implementation of the language requires that each process reduction be viewed as an atomic transaction, which reads from and writes to logical variables. A parallel interpreter must ensure that its resulting behavior is serializable, i.e. can be ordered to correspond to some possible behavior of the sequential interpreter. Such an algorithm has been designed [ref distributed] and is currently being im- 291 plemented on Intel's iPSC at the Weizn~ann Institute.

4. Concurrent Prolog Programming Techniques

In the past three years of its use, Concurrent Prolog has developed a wide range of programming techniques. Some are simply known concurrent program- ming techniques restated in the formalism of logic programming, e.g. divide-and- conquer, monitors, stream-processing, and bounded buffers. Others are novel techniques, which exploit the unique aspects of logic programs, notably the log- ical variable. Examples include difference-streams, incomplete-messages, and the short-circuit technique. Some techniques exploit properties of the read-only vari- able, e.g. blackboards, constraint-systems, and protected data-structures. Perhaps the most important in the long-run are the meta-programming tech- niques. Using enhanced meta-interpreters, one can implemented a wide spectrum of programming environment and operating system functions, such as inspecting and affecting the state of the computation, and detecting distributed termination and deadlock, in a simple and uniform way [45,20]. In the following account of these techniques breadth was preferred over depth. References to deeper treatment of various subjects are provided.

4.1 Divide-and-conquer: recursion and communication

Divide and conquer is a method for solving a problem by dividing it into subproblems, solving them, possibly in parallel, and combining the results. If the subproblems are small enough they are solved directly, otherwise they are solved by applying the divide-and-conquer method recursively. Parallel divide-and-conquer algorithms can be specified easily in both functional and logic languages. Divide- and-conquer becomes more interesting when it involves cooperation, and hence direct communication, among the processes solving the subproblems. Program 4.1 solves a problem due to Leslie Lamport [30]. The problem is to number the leaves of a tree in ascending order from left to right, by the following recursive algorithm: spawn leaf processes, one per leaf, in such a way that each process has an input channel from the leaf process to its left, and an output channel to the leaf process to its right. The leftmost leaf process is initialized with .a number. Each process receives a number from the left, numbers its leaf with it, increments it by one, and sends the result to the right. The problem is shown in order to explore the problematies of combining recursion with communication, and is not necessarily a useful parallel algorithm. The program assumes that binary trees are represented using the terms 292

number(leaf(N) ,N,N1) ~-- plus(N?,l,N1)o number(tree(L,R),N,N 2) ~- number(L?,N?,N1), number(R?,Nl?,N2). Program 4.1: Numbering the leaves of a tree: recursion with general communication

leaf(X) and tree(L,R). For example tree(leaf(Xl),tree(leaf(X2),leaf(X3))) is a tree with three leaves. Program 4.1 works in parallel on the two subtrees of a tree, until it reaches a leaf, where it spawns a plus process. A plus process suspends until its first two arguments are integers, then unifies the third with their sum. The plus processes, however, cannot operate in parallel. Rather, they are synchronized in such a way that they are activated one at a time, starting from the leftmost node. Program 4.1 passes the communication channels to the leaf processes in a simple and uniform way, via unification. It numbers a leaf by unifying its value with the left channel, even before that channel has transmitted a value.

4.2 Stream processing

Concurrent Prolog is a single-assignment programming language, in that a logical variable can be assigned to a non-variable term only once during a compu- tation. Hence it seems that, as a communication channel, a shared logical variable can transmit at most one message between two processes. This is not quite true. A variable can be assigned to a term that contains a message and another variable. This new variables is shared by the processes that shared the original variable. Hence it can serve as a new communication channel, which can be assigned to a term that contains an additional message and an additional variable, and so on ad infinitum. This idea is the basis of stream communication in Concurrent Prolog. In stream communication, the communicating processes, typically one sender and one receiver (also called the stream's producer and consumer) share a variable, say Xs. The sender, who wants to send a sequence of messages ml,m2,m3,... assigns Xs to [ml]Xsl] in order to send ml, then instantiates Xsl to [m21Xs2] to send m2, then assigns Xs2 to [m3[Xs3], and so on. The receiver inspects the read-only variable Xs? attempting to unify it with 293

merge([XIXsl,Ys,[X[Zs]) ~- merge(Xs?,Ys?,Zs). merge(Xs,[YiYs],[YlZs]) +--merge(Xs?,Ys?,Zs). merge([ 1,[ ],[ 1).

Program 4.2: A binary stream merger

[M1 IXsl ]. When successful, it can process the first message MI, and iterate with Xsl?, waiting for the next message. Exactly the same technique would work for one sender and multiple receivers, provided that all receivers have read-only access to the original shared variable. A receiver that spawns a new process can include it in the group of receivers by providing it with a read-only reference to the current stream variable. Program 3.1 for Quicksort demonstrates stream processing. Each partition process has one input stream and two output streams. On each iteration it con- sumes one element from its input stream, and places it on one of its output streams. When it reaches the end of its input stream it closes its two output streams and terminates. The append process from the same program is a simpler example of a stream processor. It copies its first input stream into its output stream, and when it reaches the end of the first input stream it binds the second input stream to its output stream, and terminates.

4.3 Stream merging

Streams are the basic communication means between processes in Concurrent Prolog. It is sometimes necessary, or convenient, to allow several processes to communicate with one other process. This is achieved in Concurrent Prolog using a stream merger. A stream merger is not a function, since its output -- the merged stream can be any one of the possible interleavings of its input streams. Hence stream- based functional programming languages incorporate stream mergers as a language primitive. In logic programming, however, a stream merger can be defined directly, as was shown by Clark and Gregory [6]; their definition, adapted to Concurrent Prolog, is shown in Program 4.2. As a logic program, Program 4.2 defines the relation containing all facts merge(Xs, Ys, Zs), in which the list Zs is an order preserving interleaving of the elements of the lists Xs and ]Is. As a process, merge(Xs~, Ys?,Zs) behaves as fol- lows: If neither Xs nor Ys are instantiated, it suspends, since unification with all 294 three clauses suspends. If Xs is a list then it can reduce using the first clause, which copies the list element to Zs, its output stream, and iterates with the up- dated streams. Similarly with Ys and the second clause. If it has reached the end of its input streams it closes its output stream and terminates, as specified by the third clause. In case both Xs and Ys have elements ready, either the first or the second clause can be used for reduction. The abstract interpreter of Flat Concurrent Prolog, defined in Figure 2.1, does not dictate which one to use. This may lead to an unfortunate situation, in which one clause (say the first) is always chosen, and elements from the second stream never appear in the output stream. A stream merger that allows this is called unfair.

There are several techniques to implement fair mergers in Concurrent Prolog. They are discussed in [51,52,67].

4.4 Recursive process networks

The recursive structure of Concurrent Prolog, together with the logical vari- able, makes it a convenient language for specifying recursive process networks.

An example is the Quicksort program above. Although hard to visualize, the program forms two tree-like networks: a tree of partition processes, which partitions the input list into smaller lists, and a tree of append processes, which concatenates these lists together.

Process trees are useful for divide-and-conquer algorithms, and for searching, among other things. Here we show an application to stream merging. An n-ary stream merger can be obtained by composing n-1 binary stream mergers in a process tree. A program for creating a balanced tree of binary merge operators is shown as Program 4.3.

Program 4.3 creates a merge tree layer by layer, using an auxiliary procedure merge_layer. The merge trees defined are static, i.e. the number of streams to be merged should be defined in advance, and cannot be changed easily. In [44] it is shown how to implement multiway dynamic merge trees in Concurrent Prolog, using the concept of 2-3-trees. Ueda and Chikayama [67] and Shapiro and Safra [52] improve this scheme further. More complex process structures, including rectangular and hexagonal process arrays [50], quad-trees [11], and pyramids, can easily be constructed in Concur- rent Prolog. These process structures are found useful in programming systolic algorithms, and spawning virtual parallel machines [64]. 295

merge_tree(Bottom,Top) ¢-.... Bottom¢[-] l merge_layer(Bott om,Bottoml), merge_tree (Bott oml?,Top). merge_tree([Xs],Xs). merge-layer ([Xs,YslBottomt,[ZslBottoml ?]) merge(Xs?,Ys?,Zs), merge_layer(Bottom?,Bottoml). merge _layer ([Xs],[Xs D . merge_layer([ ],[ ]). merge(Xs,Ys,Zs) ~ See Program 5.10.

Program 4.3: A balanced binary merge tree

4.5 Systolic programming: parallelism with locality and pipelining

Systolic algorithms were designed originally by Kung and his colleagues [29] for implementation via special purpose hardware. However, they are based on two rather general principles: 1. Localize communication 2. Overlap and balance computation with communication. The advantages of implementing systolic algorithms on general purpose paral- lel computers using a high-level language, compared to implementation in special purpose hardware, are obvious. The systolic programming approach [50] was con- ceived in an attempt to apply the systolic approach to general purpose parallel computers. The specification of systolic algorithms in Concurrent Prolog is rather straightforward. However, to ensure that performance is preserved in the im- plementation, two aspects of the execution of the program need explicit attention. One is the mapping of processes to processors, which should preserve the locality of the algorithm, using the locality of the architecture. Another is the communi- cation pattern employed by the processes. In the systolic programming approach [50], the mapping is done using a special notation, Logo-like Turtle programs [36]. Each process, like a turtle in Logo, is associated a position and a heading. A goal in the body of a clause may have a Turtle program associated with it. When activated, this Turtle program, applied to the position and heading of the parent process, determines the position and 296

mm([ ],_,[ 1). mm([XIXsl,Ys,[ZIZs]) +- vm(X,Ys?,Z) @right, mm(Xs?,Ys,Zs)@forward. vm(-,[ ],[ ]). vm(Xs,[VtYs],[ZIZs]) ~-- ip(Xs?,Y?,Z), vm(Xs,Ys?,Zs) @forward. ip([XIXs ],[Y]Ys],Z) *-- Z:=(X*Y)+Z1, ip(Xs?,Ys?,Z1). ip([ ],[ 1,0).

Program 4.4: Matrix multiplication

heading of the new process. Using this notation, complex process structures can be mapped in the desired way. Programming in Concurrent Prolog augmented with Turtle programs as a mapping notation is as easy as mastering a herd of turtles. Pipelining is the other aspect that requires explicit attention. The perfor- mance of many systolic algorithms depends on routing communication in specific patterns. The abstract specification of a systolic algorithm in Concurrent Prolog often does not enforce a communication pattern. However, the tools to do that are in the language. By appropriate transformations, broadcasting can be replaced by pipelining, and specific communication patterns can be enforced [63]. For example, Program 4.4 is a Turtle-annotated Concurrent Prolog pro- gram for multiplying two matrices, based on the classic systolic algorithm which pipelines two matrices orthogonally on the rows and columns of a processor array [ref Kung]. It assumes that the two input matrices are represented by a stream of streams of their columns and rows respectively. It produces a stream of streams of the rows of the output matrix. The program operates by spawning a rectangular grid of ip processes for computing the inner-products of each row and column. Unlike the original systolic algorithm, this program does not pipeline the streams between ip processes but rather broadcasts them. However, pipelining can be easily achieved by adding two additional streams to each process [50].

4.6 The logical variable

All the programming techniques shown before can be realized in other com- 297 putation models, with various degrees of success. For example, stream processing can be specified with functional notaLion [27]. By adding to functional languages a non-deterministic constructor they can even specify stream mergers [12]. Using simultaneous recursion equations one can specify recursive process networks. In this section we show Concurrent Prolog programming techniques which are unique to logic programming, as they rely on properties of the logical vari- able. Of course, one can take a functional programming language, extend it with stream constructors, non-deterministic constructors, simultaneous recursion equa- tions, and logical variables, and perhaps achieve these techniques as well. But why approximate logic programming from below, instead of just using it?

4.6.1. Incomplete messages

An incomplete message is a message that contains one or more uninstantiated variables. An incomplete message can be viewed in various ways, including: ® A message that is being sent incrementally. • A message containing a communication channel as an argument. • A message containing implicitly the identity of the sender. • A data structure that is being constructed cooperatively. The first and second views are taken by stream processing programs. A stream is just a message being sent incrementally, and each list-cell in the stream is a message containing the stream variable to be used in the subsequent commu- nication. Similarly, the processes for constructing the merge trees communicated via incomplete messages, each containing a stream of streams. However, it is not necessary that the sender of an incomplete message would be the one to complete it. It could also be the receiver. Two Concurrent Prolog programming techniques -- monitors and bounded-buffers [59] M operate this way. Monitors also take the third view, that an incomplete message holds implicitly the identity of its sender. This view enables rich communication patterns to be speci- fied without the need for an extra layer of naming conventions and communication protocols, by providing a simple mechanism for replying to a message.

4.6.2. Monitors

Monitors were introduced into conventional concurrent programming lan- guages by Hoare [21], as a technique for structuring the management of shared data. A monitor has some local data, which it maintains, and some procedures, or entries, defined for manipulating and examining the data. A user process that wants to update or inspect the data performs the relevant monitor call. 298

stack([push(X) ISl) stack(In?,[XIS]). stack([pop(X) llnl,[XtS]) *-- stack(In?,S). stack([ ],[ ]).

Program 4.5: A stack monitor

The monitor has built-in synchronization mechanisms, which prevent different callers from updating the data simultaneously and allow the inspection of data only when it is in an integral state. One of the convenient aspects of monitors is that the process performing a monitor-call does not need to identify itself explicitly. Rather, some of the arguments of the monitor call (which syntactically looks similar to a procedure call) serve as the return address for the information provided by the monitor. When the monitor call completes the caller can inspect these arguments and find there the answer to its query. Stream-based languages can mimic the concept of a monitor as follows [2]. A designated process, the "monitor" process, maintains the data to be shared. Users of the data have streams connected to the monitor via a merger. "Monitor calls" are simply messages to the monitor, which update the data and respond to queries according to the message received. The elegance in this scheme is that no special language constructs need be added in order achieve this behavior: the concepts already available, of processes, streams, and mergers, are sufficient. The awkward aspect of this scheme is routing the response back to the sender. Fortunately, in Concurrent Prolog incomplete messages allow responses to queries to be routed back to the sender directly, without the need for an explicit naming and routing mechanism. Both the underlying mechanism required to im- plement incomplete messages and the resulting effect from the user's point of view are similar to conventional monitors, where a process that performs a monitor call finds the answer by inspecting the appropriate argument of the call, after the call is "served". Hence Concurrent Prolog provides the convenience of monitors, while maintaining the elegance of stream-based communication. In contrast to conven- tional monitors, Concurrent Prolog monitors are not a special l~nguage construct, but simply a programming technique for organizing processes and data. Program 4.5 implements a simple stack monitor. It understands two messages: push(X), on which it changes the stack contents S to [X]S], and pop(X), to which it responds by unifying the top element of the stack with X, and changing the stack contents to contain the remaining stack, pop(X) is an example of an incomplete message. 299

Monitors in Concurrent Prolog are discussed further in [48,49}.

4.6.3. Detecting distributed termination: the short-circuit technique

Concurrent Prolog does not contain a sequential-AND construct. Suggestions to include one were resisted for two reasons. First, a desire to keep the number of language constructs down to a minimum. Second, the belief that even if eventually such a construct would be needed, introducing it at an early stage would encour- age awkward and lazy thinking. Instead of using Concurrent Prolog's datafiow synchronization mechanism, programmers would resort to the familiar sequential construct 2 . In retrospect, this decision proved to be very important, both from an educa- tional and an implementation point of view. Concurrent Prolog still does not have sequential-AND and Logix does not have the necessary underlying machinery to implement it, even if it was desired. The reason is that implementing sequential- AND in Concurrent Prolog on a parallel machine requires solving the problem of distributed termination detection. To run P& Q (assuming that & is the sequential- AND construct) one has to detect that P has terminated in order to proceed to Q. If P spawned many parallel processes that run on different processors, it requires detecting when all of them have terminated, which is a rather difficult problem for an implementation to solve. On the other hand, there is sometimes a need to detect when a computation terminates. First of all, as a service to the programmer or user who wishes to know whether his program worked properly and terminated, or if it has some useful or useless processes still running there in the background. Second, when interfacing with the external environment there is a need to know whether a certain set of operations, e.g. a transaction, has completed in order to proceed. This problem can be solved using a very elegant Concurrent Prolog program- ming technique, called the short-circuit technique, which is due to Takeuchi [58]. The idea is simple: chain the processes in a certain computation using a circuit, where each active process is an open switch on the circuit. When a process ter- minates, it closes the switch and shortens the circuit. When the entire circuit is shortened, global termination is detected. The technique is implemented using logical variables, as follows: each process is invoked with two variables, Left and Right, where the Left of one process is unified with the Right of another. The leftmost and rightmost processes each have

2 Early Prolog-in-Lisp implementations~ which provided an easy cop-out to Lisp, had a similar fate. Users of these systems -- typically experienced Lisp hackers -- would resort to Lisp whenever they were confronted with a difficult programming problem, instead of thinking it through in Prolog. This led some to conclude that Prolog ~wasn't for real". 300 one end of the chain connected to the manager. The manager instantiates one end of the chain to some constant and waits till the variable at the other end is instantiated to that constant as well. Each process that terminates unifies its Left and Right variables. When all terminate the entire chain becomes one variable and the manager sees the constant it sent on one end appearing on the other. An example of using the short-circuit technique is shown below, in Program 4.7.

4.7 Meta-programming and partial evaluation

Meta-programs are programs that treat other programs as data. Examples of meta-programs include compilers, assemblers, and debuggers. One of the most important and useful type of meta-programs is the meta-interpreter, sometimes called a meta-circular interpreter, which is an interpreter for a language written in that language. A meta-interpreter is important from a theoretical point of view, as a measure for the quality of the language design. Designing a language with a simple meta- interpreter is like solving a fixpoint equation: if the language is too complex, its meta-interpreter would be large. If it is too weak, it won't have the necessary data-structures to represent its programs and the control structures to simulate them. A language may have several meta-interpreters of different granularities. In logic programs, the most useful meta-interpreter is the one that simulates goal reduction, but relies on the underlying implementation to perform unification. An example of a Flat Concurrent Prolog meta-interpreter at this granularity is shown as Program 4.6. The meta-interpreter assumes that a guardless clause A *- B in the interpreted program is represented using the unit clause elause(A,B). If the body of the clause is empty, then B=true. A guarded clause A *-- G1B is represented by clause(A,B) *-- Gltrue. A similar interpreter for full Concurrent Prolog is shown in [48]. The plain recta-interpreter is interesting mostly for a theoretical reason, as it does nothing except simulate the program being executed. However, slight variations on it result in recta-interpreters with very useful functionalities. For example, by extending it with a short circuit, as in Program 4.7, a termination-detecting meta=interpreter is obtained. Many other important functions can be implemented via enhanced meta- interpreters [45]. In Prolog, they have been used to implement explanation facili- ties for expert systems [56]. In compiler=based Prolog systems, as well as in Logix, the debugger is based on an enhanced meta-interpreter, and layers of protection 301

reduce(true). % halt reauce((A,B)) % fork reduce(A?), reduce(B?). reduce(A) +- % reduce A#true, a#(_,_) ] clause(A?,B), reduce(B ?).

Program 4.6: A plain meta-interpreter for Flat Concurrent Prolog

reduce(A,Done) +- reducel (A,done-Done). reducel (true,Done-Done). % halt reducel ((A,B),Left-Right) ~-- % fork reducel (A?,Left-Middle), reducel (B ?,Middle-Right). reducel (A,Left-Right) ~- % reduce A#true, A#(_,_) I clause( A ?,B), reduce I (B ?,Left-Right ).

Program 4.7: A termination detecting meta-interpreter

and control are defined via meta-interpreters [20]. Such meta-interpreters, including abortable, interruptible, failsafe, and dead- lock detecting meta-interpreters, are shown and explained in [ref Hirsch]. One problem with using such meta-interpreters directly is the execution overhead of the added layer of interpretation, which is unacceptable in many applications. In [45,60] it is shown how partial evaluation, a program-transformation tech- nique, can eliminate the overhead of meta-interpreters. In effect, partial evaluation can turn enhanced meta-interpreters into compilers, which produce as output the input program enhanced with the functionality of the meta-interpreter.

4.8 Modular programming and programming-in-the-large

The techniques shown above refer mostly to programming in the small. This does not mean that Concurrent Prolog is not suitable for programming in the large. To the contrary, we found that even using the simple module system developed for bootstrapping Logix many people could cooperate in its development. We expect 302 the situation to improve further using the hierarchical module system, currently under development. The key idea in these module systems, which are implemented entirely in Concurrent Prolog, is to use Concurrent Prolog message-passing to implement inter-module calls. This means that no additional communication mechanism is needed to support remote procedure calls between modules which reside on different processors.

5. The Development of Concurrent Prolog

Concurrent Prolog was conceived and first implemented in November 1982, in an attempt to extend Prolog to a concurrent programming language, and to clean- up and generalize the Relational Language of Clark and Gregory [6]. Although one of the goals of the language was to be a superset of sequential Prolog, the proposed design did not seem, on the face of it, to achieve this goal, and hence was termed UA Subset of Concurrent Prolog" [49]. A major strength of that language, which later became known simply as Con- current Prolog, was that it had a working, usable, implementation: an interpreter written in Prolog [49]. Since the concepts of the language were quite radical at the time, it seemed fruitful to try and explore them experimentally, by writing programs in the language, rather than to get involved in premature arguments on language constructs, or to implement the language "for real" before its concepts were explored and understood, or to extend this "language subset" prematurely, before its true limitations were encountered. In this respect the development of Concurrent Prolog deviated from the com- mon practice of research on a new programming language. This typically concen- trates on theoretical aspects of the language definition (e.g. CCS [34]), or attempts to construct an efficient implementation of it (e.g. Pascal), but rarely focuses on actual usage of the language through a prototype implementation. This exploratory activity proved tremendously useful. Novel ways of using logic as a programming language were unveiled [49,55,58], and techniques for incor- porating conventional concepts of concurrent programming in logic were developed [48,51]. Most importantly, a large body of working Concurrent Prolog programs that solve a wide range of problems and implement many types of algorithms were gathered. This activity, which continued for a period of about two years mostly at ICOT and at the Weizmann Institute, resulted in papers on "How to do X in Concurrent Prolog" for numerous X's [5,11,14,17,18,19,46,48,50,51,52,55,57]. A programming language cannot be general purpose if only a handful of ex- perts can grasp it and use it effectively. To investigate how easy is Concurrent 303

Prolog to learn, I have taught Concurrent Prolog programming courses at the Weizmann Institute and at the Hebrew University at Jerusalem. Altogether about 90 graduate and 100 undergraduate students in Computer Science have attended these courses. Based on performance in programming assignments and on the quality of the course's final programming projects, it seems that more then three- quarters of the students became effective Concurrent Prolog programmers. The accumulated experience suggested that Concurrent Prolog would be an expressive and productive general-purpose programming language, if implemented efficiently. The strength of the language was perceived mostly in systems program- ming [20,45,48,59] and in the implementation of parallel and distributed algo- rithms [17,18,46,50]; it also seemed suitable for the implementation of knowledge- programming tools for AI applications [14,19], and as a system-description and simulation language [5,57]. The next step was to try and develop an efficient implementation of the lan- guage on a uniprocessor, to serve as a building-block for a parallel implementation and as a tool for exploring and testing the applicability of the language further. This proved to be surprisingly difficult. Interpreters for the language developed at the Weizmann Institute exhibited miserable performance [32]. A compiler of Con- current Prolog on top of ProIog was developed at ICOT [68]. Although the latest version of the compiler reached a speed of more then 10K reductions per second, which is more then a quarter of the speed of the underlying Prolog system on that machine, it did not scale to large applications since it employed busy waiting. In addition to the implementation difficulties, subtle problems and opacities in the definition of the OR-parallel aspect of Concurrent Prolog were uncovered [42,66]. As a result of these difficulties we decided to switch research direction, and concentrate our implementation effort on Flat Concurrent Protog, the AND- parallel subset of Concurrent Prolog. Flat Concurrent Prolog was a "legiti- mate" subset of Concurrent Prolog for two reasons. First, it has a simple meta- interpreter, shown above as Program 4.6. Second, we have discovered that almost all the applications that have been written in Concurrent Prolog previously are either in its Flat subset already, or can be easily hand-converted into it. This demonstrated the utility of having a large body of Concurrent Prolog code. With- out it we would not have had the courage to make what seemed to be such a drastic cut in the language. There was one Concurrent Prolog program that would not translate into Flat Concurrent Prolog easily: an Or-parallel Prolog interpreter. This four-clause pro- gram, written by Ken Kahn, and shown as Program 5.1, was simultaneously the final victory of Concurrent Prolog, and its death-blow. It was a victory to the pragmatic expressiveness of Concurrent Prolog, since it showed that without ex- tending the original "Subset of Concurrent Prolog', the language was as expres- 304

solve([ ]). solve([AlAsl) *- clauses(A,Cs), resolve(A?,Cs?,As?). resolve( A,[(A ~- Bs)]Cs],As) *- append(Bs?,As?,ABs), solve(A,Bs?) [true. resolve(A,iC[Cs] ,As ) *- resolve(A?,Cs?,As?) ]true. append(Xs,Ys,Zs) *-- See Program 3.1 clauses(A,Cs) ~- Cs is the list of clauses in A's procedure.

Program 5.1: Kahn's Or-parallel Prolog interpreter

sive as Prolog: any pure Prolog program can run on a Concurrent Prolog machine (with Or-parallelism for free!), by adding to it the four clauses of Kahn's inter- preter. Thus the original design goal of Concurrent Prolog -- to have a concurrent programming language that includes Prolog -- was actually achieved, though it took more then a year to realize that. It was a death-blow to the implementability of Concurrent Prolog, at least for the time being, since it showed that implementing Concurrent Prolog efficiently is as hard as, and probably harder than, implementing Or-parallel Prolog. As we all know, no one knows how to implement Or-parallel Prolog efficiently, as yet. Once the switch to Flat Concurrent Prolog was made, in June 1984, imple- mentation work began to progress rapidly. A simple interpreter for the language was implemented in Pascal [33]. An abstract instruction set for Flat Concurrent Prolog, based on the Warren Instruction set for unification [72] and the abstract machine embodied in the FCP interpreter, was designed [24], and an initial version of the compiler was written in Flat Concurrent Prolog. In July 1985, the bootstrapping of this compiler-based system was completed. The system, called Logix [54] is a single-user multi-tasking program development environment. It consists of: a five-pass compiler, including a tokenizer, parser, preprocessor, encoder, and an assembler. An interactive shell, which includes a command-line editor, and supports management and inspection of multiple parallel computations. A source level debugger, based on a meta-interpreter; a module system that supports separate compilation, runtime linking, and a free mixing of interpreted (debuggable) and compiled modules. A tty-controlIer, which allows multiple parallel processes, including the interactive shell, to interact with the user 305 in a consistent way. A simple file-server, which interfaces to the Unix file system; and some input, output, profiling, style-checking, and other utilities. The system is written in Flat Concurrent Prolog. Its source is about 10,000 lines of code long, divided between 45 modules. About half of it is the compiler. The system uses no side-effects or other extra-logical constructs, except in a few well-defined places. In the interface to the physical devices, low-level ker- nels make the keyboard and screen look like Concurrent Prolog input and output streams of bytes, and the Unix file system looks like a Concurrent Prolog monitor that maintains an association table of (FileName,FileContents). In the multiway stream merger and distributer, which are used heavily by the rest of the system, destructive-assignment is used to achieve constant delay [52], compared with the logarithmic delay that can be achieved in pure Concurrent Prolog [51]. The other part of the system, written in C, includes an emulator of the ab- stract machine, an implementation of the kernels, and a stop-and-copy garbage collector [24]. It is about 6000 lines of code long. When compiled on the VAX, the emulator occupies about 60K bytes, and Logix another 300K bytes ~ . When idle, Logix consists of about 750 Concurrent Prolog processes. Logix itself is running as one Unix process. The compiler compiles about 100 source lines per cpu minute on a VAX/ll- 750. A run of the compiler on the encoder, which is about 400 lines long, creates about 31,000 temporary Concurrent Prolog processes, and generates about 1.5M bytes of temporary data structures (garbage). During this computation about 90,000 process reductions occur and 10,000 process suspensions/activations. Overall, the system achieves at present about a fifth to a quarter of the speed of Quintus Prolog [38], which is the fastest commercially available Prolog on the VAX today. The number is obtained by comparing Concurrent Prolog process reductions to Prolog procedure calls for the same logic programs. This indicates that the efficiency of Warren's abstract ProIog machine [72], which is at the basis of Quintus Prolog, and our Flat Concurrent Prolog machine is about the same. The gap can be closed by rewriting our emulator in assembly language, as Quintus does. To explain this similarity in performance, recall that although Flat Concurrent ProIog needs to create and maintain processes, which is a bit more expensive then creating stack frames for Prolog procedure calls, it does not support deep backtracking, where Prolog does and pays dearly for it.

3 At the moment we use word encoding, rather then byte encoding, for the abstract machine instructions. 306

6. Efforts at ICOT and Imperial College: GHC and PARLOG

In the meantime ICOT did not stand still. Given their decision to use Con- current Prolog as the basis for Kernel Language 1 [13], the core programming language of their planned Parallel Inference Machine, they have also attempted to implement its Or-parallel aspect° Prototype implementations of three different schemes were constructed, namely shallow-blnding [35], deep-binding, and lazy- copying (the scheme we tried at Weizmann) [62]. Shallow binding proved to be the fastest, but did not seem to scale to multiprocessors. Lazy copying was the slowest, so the choice seemed to fall on deep-binding. Unfortunately the imple- mentation scheme was rather complex, and the subtle problems with Concurrent Prolog's Or-parallelism were still unsolved. On the other hand, ICOT did not want to follow the Flat Concurrent Prolog path since it seemed to take them even further away from Prolog and from the AI applications envisioned for the Parallel Inference Machine. An elegant solution to these problems was found in Guarded Horn Clauses [65], a novel concurrent logic programming language. The main design choice of GHC was to eliminate multiple Or-parallel environments from Concurrent Prolog. Besides avoiding a major implementation problem, this decision also provided a synchronization rule: if you try to write on the parent environment, then suspend (in Concurrent Prolog a process would allocate a local copy of the variable and continue instead). This rule made the read-only annotation somewhat superfluous. The resulting language exhibits elegance and conciseness, and seems to capture most of Concurrent Prolog's applications and programming techniques, exclud- ing, of course, Kahn's Or-parallel Prolog interpreter. GHC is the current choice of ICOT for Kernel Language 1. Besides solving some of the difficulties in the definition and implementation of Concurrent Prolog, GHC is "Made in Japan", which certainly is not a disadvantage from ICOT's point of view. Recent imple- mentation efforts at ICOT concentrate on Flat GHC, which is the GHC analogue to Flat Concurrent Prolog. So why didn't we switch to GHC? Long discussion were carried at our group about this option. Our general conclusion was that even though GHC is a simpler formalism, it is also more fragile, tess expressive, and more difficult to extend. We felt it would either break or lose much of its elegance when faced with the problems of implementing a real operating system, which includes a secure kernel, error- handling for user programs, and distributed termination and deadlock detection. Furthermore, it would be less adequate for AI applications, since it has a weaker notion of unification. Another related research effort is the development of the PARLOG program- ming language by Clark and Gregory at Imperial College [7]. PARLOG is compiler- oriented, even more than GHC, in a way that seems to render it unsuitable for 307 meta-programming. Given our com~aitment to implement the entire programming environment and operating system around the concepts of metaAnterpretation and partial-evaluation, we cannot use PAELOG. On the performance side, PARLOG and GHC seem quite similar, except that GHC has to make a runtime check that guards do not write on the parent's environment, whereas PARLOG ensures this at compile-time, using what is called a safety-check [8]. On the expressiveness side, there does not seem to be a grea.t difference between PARLOG and GHC, except for meta-programming. Alternative synchronization constructs to the read-only variable were pro- posed by Saraswat [43] and by Ramakrishnan and Silberschatz [39].

7. Current Research Directions

The main focus of our current research at the Weizmann Institute is the implementation of a Concurrent Protog based general-purpose parallel computer system. Our present implementation vehicle is Intel's iPSC d4/me, a memory- enhanced four-dimensional hypercube, which, incidentally, is isomorphic to a 4 x 4 mesh-connected torus. As a first step, a distributed FCP interpreter is being im- plemented in C, based on a distributed unification algorithm which guarantees the atomicity of goal reductions [44]. Also a technique for implementing Concurrent ProIog virtual machines that manage code and process mapping on top of the physical machine has been developed [64]. Since Logix is self-contained, once the abstract FCP machine runs on a paral- lel computer, an entire program development environment and operating system will also become available on it. For example, the Logix source-level debugger, as well as other meta-interpreter based tools such as a profiler, would preserve the parallelism of the interpreted program while executing on a parallel computer. So with this system a parallel computer could be used both as the development machine and as the target machine, which is clearly advantageous over the se- quential front-end/parallel back-end machine approach. Since both source text, parsed code, and compiled code are first-class objects in Logix, routines that im- plement code-management algorithms on the parallel computer could be written in Concurrent Prolog itself [64]. A technique for compiling Concurrent Prolog into Flat Concurrent Prolog was developed [10]. It involves writing a Concurrent Prolog interpreter in Flat Con- current Prolog, and then partially evaluating it [15] with respect to the program to be compiled. It avoids the dynamic multiple-environment problem by requiring static output annotations on variables to be written upon. An attempt to provide Concurrent Prolog with a precise semantics is also being made, following initial work by Levi and Palamidessi [31] and Saraswat [43]. 308

Another research direction pursued is partial evaluation [45], a technique of program transformation and optimization which proves to be very versatile when combined with heavy usage of interpreters and meta-interpreters [20,54], as in Logix. We believe that parallel execution is not a substitute for, but rather is dependent upon, efficient uniprocessor implementation. To that effect a high- performance FCP compiler is being developed. Hand timings indicate expected performance of about 30K LIPS for a 10MHz 68010. Lastly, Logix itself is still under development. Short term extensions include a hierarchical module system and a window system. Longer term research includes extending it to a multiprocessor/multiuser operating system.

8. Conclusion

Our research on Concurrent Prolog has demonstrated that a high-level logic programming language can express conveniently a wide range of parallel algo- rithms. The performance of the Logix system demonstrates that a side-effect free language based on light-weight processes can be practical even on conventional uniprocessors. It thus "debunks the expensive process spawn myth". Its func- tionality and pace of development testifies that Concurrent Prolog is a usable and productive systems programming language. We have yet to demonstrate the practicality of Concurrent Prolog for pro- gramming parallel computers. Our prototyping engine is Intel's iPSC. We find the ultimate and most important question to be: which of the cur- rently proposed approaches will result in a scalable parallel computer system, whose generality of applications, ease of use, and cost/performance ratio in terms of both hardware and software would compete favorably with existing sequential computers. Until such a system is demonstrated, the question of parallel processing could not be considered as solved.

Acknowledgements

The research reported on in this survey has been conducted in cooperation with many people at ICOT, The Weizmann Institute, and other places; perhaps too many to recall by name. I am particularly indebted to the hospitality and 309 stimulating research environment provided by ICOT and its people. The devel- opment of Logix was supported by IBM Poughkeepsie, Data Systems Division. Contributors to its development include Avshalom Houri, William Silverman, Jim Crammond, Michael Hirsch, Colin Mierowsky, Shmuel Safra, Steve Taylor, and Marc Rosen. I am grateful to Vijay saraswat for discussions on read-only unifica- tion, and to Steve Taylor and William Silverman for comments on earlier drafts of the paper.

References

[1] W.B. Ackerman, "Data flow languages", IEEE Computer, Vot. 15, No. 2, 1982, pp. 15-25. [2] Arvind and J.D. Brock, "Streams and managers", in M. Makegawa and L.A. Belady (eds.), Operating Systems Engineering, Springer-Verlag, 1982, pp. 452- 465. Lecture Notes in Computer Science, No. 143. [3] C. Bloch, "Source to source transformations of logic programs", Weizmann Institute Technical Report CS84-22, 1984. [4] D.L. Bowen, L. Byrd, L.M. Pereira, F.C.N. Pereira and D.H.D. Warren, "PROLOG on the DECSystem-10 user's manual", Technical Report, Uni- versity of Edinburgh, Department of Artificial Intelligence, October, 1981. [5] K. Broda and S. Gregory, "PARLOG for discrete event simulation", Proceed- ings of the 2nd International Logic Programming Conference, Uppsala, 1984, pp. 77-312. [6] K.L. Clark and S. Gregory, "A relational language for parallel programming", in Proceedings of the A CM Conference on Functional Programming Languages and Computer Architecture, October, 1981. [7] K.L. Clark and S. Gregory, "PARLOG: Parallel programming in logic", Re- search Report DOC 84/4, April, 1984. [8] K.L. Clark and S. Gregory, ~Notes on the implementation of PARLOG", Research Report DOC 84/16, October, 1984. [9] K.L. Clark and S.-A. Tarnlund,"A first-order theory of data and programs", in B. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 939-944. [10] M. Codish and E. Shapiro, "Compiling Or-parallelism into And-parallelism", Proceedings of the Third International Conference on Logic Programming, Springer LNCS, July 1986. [11] S. Edelman and E. Shapiro, "Quadtrees in Concurrent Prolog', Proceedings of 310

the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 544-551. [12] D.P. Friedman and D.S. Wise, "An approach to fair applicative multiprogram- ming', in G. Kahn (ed.), Semantics of Concurrent Computations, Springer- Verlag, 1979. Lecture Notes in Computer Science, No. 70. [13] K. Furukawa, S. Kunifuji, A. Takeuchi and K. Ueda, "The conceptual speci- fication of the Kernel Language version 1", ICOT Technical Report TR-054, 1985. [14] K. Furukawa, A. Takeuchi, S. Kunifuji, H. Yasukawa, M. Ohki and K. Ueda, "Mandala: A logic based knowledge programming system", Proceedings of FGCS '84, Tokyo, Japan, 1984, pp. 613-622. [15] Y. Futamura, "Partial evaluation of computation process - an approach to a compiler-compiler', Systems, Computers, Controls, Vol. 2, No. 5, 1971, pp. 721-728. [16] C.C. Green, "Theorem proving by resolution as a basis for question answer- ing", in B. Meltzer and D. Michie (eds.), Machine Intelligence, Vol. 4, Edin- burgh University Press, Edinburgh, 1969, pp. 183-205. [17] L. Hellerstein, "A Concurrent Prolog based region finding algorithm", Honors Thesis, Harvard University, Computer Science Department, May, 1984. [18] L. Hellerstein and E. Shapiro, "Implementing parallel algorithms in Concur- rent Prolog: The MAXFLOW experience", Proceedings of the International Symposium on Logic Programming, Atlantic City, New Jersey, February, 1984. [19] H. Hirakawa, "Chart parsing in Concurrent Prolog", ICOT Technical Report TR-008, 1983. [20] M. Hirsch, W. Silverman and E. Shapiro, "Layers of protection and control in the Logix system", Weizmann Institute Technical Report CS86-??, 1986. [21] C.A.R. Hoare, "Monitors: an operating systems structuring concept", Com- munications of the ACM, Vol. 17, No. 10, 1974, pp. 549-557. [22] C.A.R. Hoare, Communicating Sequential Processes, Prentice-Hall, 1985. [23] J.E. Hopcroft and J.D. Ullman, Introduction to automata theory, Languages, and Computation, Addison Wesley, Reading, MA, 1979. [24] A. Houri, "An abstract machine for Flat Concurrent Prolog', M.Sc. Thesis, Weizmann Institute of Science, 1986. [25] INMOS Ltd., IMS T424 Transputer Reference Manual, INMOS, 1984. [26] S.D. Johnson, "Circuits and systems: Implementing Communications with streams", Technical Report 116, Indiana University, Computer Science De- partment, October, 1981. 311

[27] G. Kahn and D.B. MacQueen, "Coroutines and networks of parallel pro- cesses", in G. Gilchrist (ed.), Information Processing, Vol. 77, North-Holland, 1977, pp. 993-998. [28] R.A. Kowalski, Logic/or Problem Solving, Elsevier North Holland Inc., 1979. [29] H.T. Kung, "Why systolic architectures?", IEEE Computer, Vol. 15, No. 1, 1982, pp. 37-46. [30] L. Lamport, "A recursive Concurrent Algorithm", January, 1982, Unpub- lished note. [31] G. Levi and Palamidessi, "The semantics of the read-only variable", 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 128-137. [32] J. Levy, "A unification algorithm for Concurrent Prolog', Proceedings of the Second International Logic Programming Conference, Uppsala, 1984, pp. 333- 341. [33] C. Mierowsky, S. Taylor, E. Shapiro, J. Levy and M. Safra, "The design and implementation of Flat Concurrent Prolog', Weizmann Institute Technical Report CS85-09, 1985. [34] R. Milner, A Calculus of Communicating Systems, Lecture Notes in Computer Science, Vol. 92, Springer-Verlag, 1980. [35] T. Miyazaki, A. Takeuchi and T. Chikayama, "A sequential implementation of Concurrent Prolog based on the shallow binding scheme", 1985 Symposium on Logic Programming, IEEE Computer Society, 1985, pp. 110-118. [36] S. Pappert, Mindstorms: Children, computers, and powerful ideas", Basic Books, New York, 1980. [37] F. Pereira, "C-Prolog user's manual", EdCAAD, University of Edinburgh, 1983. [38] Quintus Prolog Reference Manual, Quintus Computer Systems Inc., 1985. [39] R. Ramakrishnan and A. Silberschatz, "Annotations for Distributed Program- ming in Logic", in Conference Record of the Thirteen Annual ACM Symposium on Principles of Programming Languages, January, 1986. [40] J.A. Robinson, "A machine oriented logic based on the resolution principle", Journal of the ACM, Vol. 12, January, 1965, pp. 23-41. [41] P. Roussel, "Prolog: Manuel reference et d'utilisation', Technical Report, Groupe d'Intelligence Artificielle, Marseille-Luminy, September, 1975. [42] V.A. Saraswat, "Problems with Concurrent Prolog', Carnegie-Mellon Uni- versity CSD Technical Report CS-86-100, January, 1986. 312

[43] V.A. Saraswat, "Partial Correctness Semantics for CP[?,[,&]', Proceedings of the Fifth Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, 1985, Springer LNCS 206. [44] M. Safra, S. Taylor and E. Shapiro, "Distributed Execution of Flat Concurrent Prolog', To appear as a Weizmann Institute technical report. [45] S. Safra and E. Shapiro, "Meta-interpreters for real", to appear in Proceedings of IFIP-86. [46] A. Shafrir and E. Shapiro, "Distributed programming in Concurrent Prolog', Weizmann Institute Technical Report CS83-12, August, 1983. [47] E. Shapiro, Algorithmic Program Debugging, MIT Press, 1983. [48] E. Shapiro, "Systems programming in Concurrent Prolog", in Logic Program- ming and its Applications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [49] E. Shapiro, "A subset of Concurrent Prolog and its interpreter", ICOT Tech- nical Report TR-003, February, 1983. [50] E. Shapiro, "Systolic programming: A paradigm of parallel processing", Pro- ceedings of FGCS '8~, Ohmsha, Tokyo, 1984. Revised as Weizmann Institute Technical Report CS84-16, 1984. [51] E. Shapiro and C. Mierowsky, "Fair, biased, and self-balancing merge opera- tors: Their specification and implementation in Concurrent Prolog', Journal of New Generation Computing, Vol. 2, No. 3, 1984, pp. 221-240. [52] E. Shapiro and S. Safra, "Fast multiway merge using destructive operations", Proceedings of the International Conference on Parallel Processing, IEEE Computer Society, August, 1985, pp. 118-122. [53] S.Safra, S.Taylor and E.Shapiro, "Distributed Execution of Flat Concurrent Prolog", To appear as Weizmann Institute technical Report, 1986. [54] W. Silverman, A. Houri, M. Hirsch and E. Shapiro, "Logix user manual, release 1.1", Weizmann Institute of Science, 1985. [55] E. Shapiro and A. Takeuchi, "Object-oriented programming in Concurrent Prolog', Journal of New Generation Computing, Vol. 1, No. 1, July, 1983. [56] L. Sterling and E. Shapiro, The Art of Prolog, MIT Press, 1986. [57] N. Suzuki, "Experience with specification and verification of complex com- puter hardware using Concurrent Prolog', in Logic Programming and its Ap- plications, D.H.D. Warren and M. van Caneghem (eds.), Ablex, 1986. [58] A. Takeuchi, "How to solve it in Concurrent Prolog', Unpublished note, 1983. [59] A. Takeuchi and K. Furukawa, "Interprocess communication in Concurrent 313

Prolog", Proceedings of the Logic Programming Workshop '82, Albufeira, Por- tugal, June, 1983, pp. 171-185. [60] A. Takeuchi and K. Furukawa, '¢Partial evaluation of Prolog programs and its application to meta programming", ICOT Technical Report TR-126, 1985. [61] H. Tamaki, "A distributed unification scheme for systolic logic programs", in Proceedings of the 1985 International Conference on Parallel Processing, pp. 552-559, IEEE, 1985. [62] J. Tanaka, T. Miyazaki and A. Takeuchi, "A sequential implementation of Concurrent Prolog - based on Lazy Copying scheme", The 1st National Con- ference of Japan Society for Software Science and Technology, 1984. [63] S.Taylor, L.Hellerstein, S.Safra and E.Shapiro "Notes on the Complexity of Systolic Programs", Weizmann Institute Technical Report CS86-??, 1986. [64] S.Taylor, E.Av-Ron and E.Y.Shapiro "Virtual Machines for Process and Code Mapping" Weizmann Institute Technical Report CS86-??, 1986. [65] K. Ueda, "Guarded Horn Clauses", ICOT Technical Report TR-103, 1985. [66] K. Ueda, "Concurrent Prolog re-examined", to appear as ICOT Technical Report. [67] K. Ueda and T. Chikayama, "Efficient stream/array processing in logic pro- gramming languages", Proceedings of the International Conference on 5th Generation Computer Systems, ICOT, 1984, pp. 317-326. [68] K. Ueda and T. Chikayama, "Concurrent Prolog compiler on top of Prolog', 1985 Symposium on Logic Programming, IEEE Computer Society, July, 1985, pp. 119-126. [69] M.H. van Emden and R.A. Kowalski, "The semantics of predicate logic as a programming language", Journal of the ACM, Vol. 23, October, 1976, pp. 733-742. [70] O. Viner, "Distributed constraint propagation", Weizmann Institute Techni- cal Report CS84-24, 1984. [71] D.H.D. Warren, "Logic programming and compiler writing", Software-Prac- tice and Experience, Vol. 10, 1980, pp. 97-125. [72] D.H.D. Warren, "An abstract Prolog instruction set", Technical Report 309, Artificial Intelligence Center, SRI International, 1983.