Logic Programming

Logic and J.A. Robinson ogic has been around for a all mathematical concepts and for ery of the logical and set-theoretic very long time [23]. It was the formulation of exact deductive paradoxes (such as Bertrand Rus- already an old subject 23 reasoning about them. It seems to sell's set of all sets which are not centuries ago, in Aristotle's be so. The principal feature of the members of themselves, which day (384-322 BC). While predicate calculus is that it offers a therefore by definition both is, and Aristotle was not its origina- precise characterization of the con- also is not, a member of itself); and Ltor, despite a widespread impres- cept of proof. Its proofs, as well as its the huge reductionist work Prin- sion to the contrary, he was cer- sentences and its other formal ex- cipia Mathematica by Bertrand Rus- tainly its first important figure. He pressions, are mathematically de- sell and Alfred North Whitehead. placed logic on sound systematic fined objects which are intended All of these developments had ei- foundations, and it was a major not only to express ideas meaning- ther shown what could be done, or course of study in his own univer- fully--that is, to be used as one uses had revealed what needed to be sity in Athens. His lecture notes on a language--but also to be the sub- done, with the help of this new logic can still be read today. No ject matter of mathematical analy- logic. But it was necessary first for doubt he taught logic to the future sis. They are also capable of being mathematicians to master its tech- Alexander the Great when he manipulated as the data objects of niques and to explore its scope and served for a time as the young construction and recognition algo- its limits. prince's personal tutor. In Alexan- rithms. Significant early steps toward dria a generation later (about 300 At the end of the nineteenth cen- this end were taken by Leopold B.C.), Euclid played a similar role tury, mathematics had reached a Lowenheim (1915), [29] and in systematizing and teaching the stage in which it was more than Thoralf Skolem [45], who studied geometry and number theory of ready to exploit Frege's powerful the symbolic "satisfiability" of for- that era. Both Aristotle's logic and new instrument. Mathematicians mal expressions. They showed that Euclid's geometry have endured were opening up new areas of re- sets of abstract logical conditions and prospered. In some high search that demanded much could be proved consistent by being schools and colleges, both are still deeper logical understanding and given specific interpretations con- taught in a form similar to their far more careful handling of structed from the very symbolic original one. The old logic, how- proofs, than had previously been expressions in which they are for- ever, like the old geometry, has by required. Some of these were David mulated. Their work opened the now evolved into a much more gen- Hiibert's abstract axiomatic recast- way for Kurt G6del (1930, [17]) and eral and powerful form. ing of geometry and Giuseppe Jacques Herbrand (1930, [19]) to Modern ('symbolic' or 'mathe- Peano's of arithmetic, as well as prove, in their doctoral disserta- matical') logic dates back to 1879, Georg Cantor's intuitive explora- tions, the first versions of what is when Frege published the first ver- tions of general set theory, espe- now called the completeness of the sion of what today is known as the cially his elaboration of the dazzling predicate calculus. G6del and predicate calculus [14]. This system theory of transfinite ordinal and Herbrand both demonstrated that provides a rich and comprehensive cardinal numbers. Others were the proof machinery of the predi- notation, which Frege intended to Ernst Zermelo's axiomatic analysis cate calculus can provide a formal be adequate for the expression of of set theory following the discov- proof for every logically true prop-

COMMUNICATIONS OF THE ACM/March 1992/Vol.35, No.3 41 Logic Programming

osition, and indeed they each gave property of the predicate calculus. time British code-breaking project a constructive method for finding There had until then been an in- included his participation in the the proof, given the proposition. tense search for a positive solution actual design, construction and G6del's more famous achievement, to what Hilbert called the decision operation of several electronic ma- his discovery in 1931 of the amaz- problem--the problem to devise an chines of this kind, and thus he ing 'incompleteness theorems' algorithm for the predicate calculus must surely be reckoned as one of about formalizations of arithmetic, which would correctly determine, the major pioneers in their early has tended to overshadow this im- for any formal sentence B and any development. portant earlier work of his, which is set A of formal sentences, whether a result about pure logic, whereas or not B is a logical consequence of Logic on the Computer his incompleteness results are about A. Church and Turing found that Apart from this enormously impor- certain applied (formal axio- despite the existence of the proof tant cryptographic intelligence matic theories of elementary num- procedure, which correctly recog- work and its crucial role in ballistic ber theory, and similar systems) nizes (by constructing a proof of B computations and nuclear physics and do not directly concern us from A) all cases where B is in fact a simulations, the war-time develop- here. logical consequence of A, there is ment of electronic digital comput- The completeness of the predi- not and cannot be an algorithm ing technology had relatively little cate calculus links the syntactic which can similarly correctly recog- impact on the outcome of the war property of formal provability with nize all cases in which B is not a logi- itself. After the war, however, its the conceptually quite different cal consequence of A. Their discov- rapid commercial and scientific semantic property of logical truth. ery bears directly on all attempts to exploitation quickly launched the It assures us that each property be- write theorem-proving software. It current computer era. By 1950, longs to exactly the same sentences. means that it is pointless to try to much-improved versions of some Formal syntax and formal seman- program a computer to answer 'yes' of the war-time general-purpose tics are both needed, but for a time or 'no' correctly to every question of electronic digital computers be- the spotlight was on formal syntax, the form 'is this a logically true sen- came available to industry, univer- and formal semantics had to wait tence?' The most that can be done is sities and research centers. By the until Alfred Tarski (1934, [46]) in- to identify useful subclasses of sen- mid-1950s it had become apparent troduced the first rigorous semanti- tences for which a decision proce- to many logicians that, at last, suffi- cal theory for the predicate calcu- dure can be found. Many such sub- cient computing power was now at lus, by precisely defining satisfi- classes are known. They are called hand to support computational ability, truth (in a given 'solvable subcases of the decision experiments with the predicate cal- interpretation), logical consequence, problem', but as far as I know none culus proof procedure. It was just a and other related notions. Once it of them have turned out to be of matter of programming it and try- was filled out by the concepts of much practical interest. ing it on some real examples. Sev- Tarski's semantics, the theory of the When World War II began in eral papers describing projects for predicate calculus was no longer 1939 all the basic theoretical foun- doing this were given at a Summer unbalanced. Shortly afterward Ger- dations of today's computational School in Logic held at Cornell hard Gentzen (1936, [15]) further logic were in place. What was still University in 1957. One of these sharpened the syntactical results on lacking was any practical way of ac- [37, pp. 74-76] was by Abraham provability by showing that if a sen- tually carrying out the vast symbolic Robinson, the logician who later tence can be proved at all, then it computations called for by the surprised the mathematical estab- can be proved in a 'direct' way, proof procedure. Only the very lishment by applying logical 'non- without the need to introduce any simplest of examples could be done standard' model theory to legiti- extraneous 'clever' concepts; those by hand. Already there were those, mize infinitesimals in the occurring already in the sentence however--Turing himself for one-- foundations of the integral and dif- itself are always sufficient. who were making plans which ferential calculus. Other published All of these positive discoveries would eventually fill this gap. Tur- accounts of results in the first wave of the 1920s and 1930s laid the ing's method in negatively solving of such experiments were [12, 16, foundations on which today's pred- the decision problem had been to 35, 49]. There had also been, in icate calculus theorem-proving pro- design a highly theoretical, abstract 1956, a strange experiment by [33] grams, and thus logic program- version of the modern stored- which attracted a lot of attention at ming have been built. program, general-purpose univer- the time. It has since been cited as a Not all the great logical discover- sal digital computer (the 'universal milestone of the early stages of arti- ies of this period were positive. In Turing machine'), and then to ficial intelligence research. The 1936 Alonzo Church and Alan prove that no program for it could authors designed their 'Logic The- Turing (see [6, 47]) independently realize the decision procedure. His ory Machine' program to prove discovered a fundamental negative subsequent leading role in the war- sentences of the propositional cal-

42 March 1992/Voh35, No.3/COMMUNICATIONS OF TH e= ACM culus (not the full predicate calcu- efficient, and permit it to find based on larger-scale, more com- lus), a very simple system of logic proofs of more interesting exam- plex, less local, and perhaps even for which there had long existed ples before it runs into the expo- highly parallel, machine-oriented well-known decision procedures. nential barrier. types of reasoning. In contemplat- They nevertheless explicitly re- Some limited progress has been ing these possible new logics it was jected the idea of using any algo- made in this direction by reorganiz- hoped their proofs would be rithmic proof procedure, aiming, ing the predicate calculus in various shorter and (at the top level) sim- instead, at making their program 'machine-oriented' versions. pler than those in the human- behave 'heuristically' as it cast about oriented logics. Of course, in the for a proof. This experiment was Evolution of Machine- interior of any individual inference, intended to model human Oriented Logic there would presumably be a large problem-solving behavior, taking The earliest versions of the predi- amount of hidden structural detail. propositional calculus theorem- cate calculus proof procedure were The global search space would be proving in particular as the all based on human-oriented - sparser, since it would need to con- problem-solving task, rather than ing patterns--on types of inference tain only the top-level structure of to program the computer to prove which reflected formally the kind proofs. The proof procedure itself propositional calculus theorems of 'small' reasoning steps which would not need to be concerned efficiently. humans find comfortable. A well- with the copious details of the con- No sooner were the first compu- known example of this is the modus ceptual microstructure packaged tational proof experiments carried ponens inference-scheme. In using within the inference steps. out than the severe combinatorial modus ponens, one infers a conclu- This was the motivation behind complexity of the full predicate cal- sion B from two premisses of the the introduction, in the early 1960s, culus proof procedure come vividly form A and (if A then B). Such of a new logic, based on two highly into view. The procedure is, after human-oriented inference-schemes machine-oriented reasoning pat- all, essentially no more than a sys- are adapted to the limitations--and terns: unification, and the various tematic exhaustive search through also to the strengths--of the kinds of resolution which incorpo- an exponentially expanding space human information-processing sys- rate it. of possible proofs. The early re- tem. They therefore tend to involve searchers were brought face-to-face simple, local, small and perceptually clausal Logic with the inexorable 'combinatorial immediate features of the state of the The 1960 paper [12] had already explosion' caused by conducting reasoning. In particular, they do drawn attention to the simplified the search on nontrivial examples. not demand the handling of more clausal predicate calculus in which These first predicate calculus than one such bundle of features at every sentence is a clause. (A clause proof-seeking programs may have a time--they are designed for serial is a sentence with a very simple inspired, and perhaps even de- processing on a single processor. form: it is just a--possibly empty-- served, the disparaging label 'Brit- The massive parallelism in human disjunction of literals. A literal, in ish Museum method' (see [33]), brain processes is well below the turn, is just the application of an which was destined to be pinned on level of conscious awareness, and it unnegated or negated predicate to any merely-generate-and-test pro- is of the essence of deductive rea- a suitable list of terms as argu- cedure which blindly and undis- soning that the human reasoner be ments). In the same year, Dag criminatingly tries all possible com- fully conscious of the 'epistemologi- Prawitz [34] had also forcefully binations in the hope that a winning cal flow' of the proof and of its step- advocated the use of the process one, or even an acceptable one, may wise assembling of his or her assent which we now call unification. Along eventually turn up. and understanding. In logics based with Stig Kanger (see [34, The intrinsic exponential com- on such fine-grained serial infer- footnote 11], p. 170) he apparently plexity of the predicate calculus ence patterns, proofs of interesting had independently rediscovered proof procedure is to be expected, propositions will tend to be large unification in the late 1950s. He because of the nature of the search assemblies of small steps. The apparently did not realize that it space. There is evidently little one search space for the corresponding had already been introduced by can do to avoid its consequences. proof procedures will accordingly Herbrand in his thesis of 1930 (al- The only reasonable course is to tend to be dense and overcrowded beit only in a brief and rather ob- look for ways to strengthen the with redundant alternatives at too scure passage). These were major proof procedure as much as possi- low a level of detail. steps in the right direction. Neither ble, by simplifying the forms of By about 1960 it had become the Davis-Putnam nor the Prawitz expressions in the predicate calcu- clear that it might be necessary to improved proof procedures, how- lus and by packing more power into abandon this natural predilection ever, went quite far enough in dis- its inference rules. This might at for human-oriented inference pat- carding human-oriented inference least make the search process more terns, and to look for new logics patterns, and their algorithms still

COMMUNI~LTIONSOF THE ACM/March 1992/Vol.35, No.3 43 Logic Progromming became bogged down too early in encountered the idea of unifica- based theorem-provers would be their searches, to be useful. tion. After struggling with the woe- significantly better than any which This was the situation when I ful combinatorial inefficiency of the had been built previously. first became interested in mechani- instantiation-based procedure used I wrote about these ideas at Ar- cal theorem-proving in late 1960. by Davis and Putnam (and by gonne at the end of the summer of From 1961 to 1964 I worked each everybody else at that time; it goes 1963, and sent the paper to the summer as a visiting researcher at back to Herbrand's so-called 'Prop- Journal of the A.C.M. (JACM). It the Argonne National Laboratory's erty B Method' developed in [19]). I then apparently remained unread Applied Mathematics Division, was immediately very impressed by on some referee's desk for more which was then directed by William the significance of this idea. It is than a year. It required some urg- F. Miller. It was Bill Miller who in essentially the idea underlying ing by the then editor of the Jour- early 1961 first introduced me to Herbrand's 'Property A Method' nal, Richard Hamming of Bell Lab- the engineering side of predicate developed in the same thesis. Here oratories, before the referee finally calculus theorem-proving by point- again was still another paper show- responded. The outcome was that ing out to me the Davis and Putnam ing that even vaster improvements the paper, [39], was published only paper. He invited me to spend the than those flowing from the Davis in January 1965. Meanwhile the summer of 1961 at Argonne as a and Putnam paper were possible manuscript had been circulating. In visiting researcher in his division, over the 'naive' predicate calculus 1964 at Argonne, Larry Wos, with the suggested assignment of proof procedure. Instead of gener- George Robinson and Dan Carson programming the Davis-Putnam ating-and-testing successive instan- programmed a resolution-based proof procedure on the IBM 704 tiations (substitutions) hoping even- theorem prover for the clausal and more generally of pursuing tually to hit upon the right ones, predicate calculus, adding to the mechanical theorem-proving re- Prawitz was describing a way of di- basic process search strategies search. rectly computing them. This was a (called unit preference and set of sup- Reading the Davis-Putnam paper breakthrough. It offered an elegant port) of their own devising, which [12] in early 1961 really changed and powerful alternative to the further speeded the resolution my life. Although Hilary Putnam blind, hopeless, enumerative 'Brit- proof process. Because of the refer- had been one of my advisers when I ish Museum' methodology, and eeing delay, their paper, reached was working on my doctoral thesis pointed the way to a new methodol- print before mine [52]) and could in philosophy at Princeton (1953- ogy featuring deliberate, goal- only cite it as 'to be published'. 1956), my research had dealt with directed constructions. Throughout the winter of 1963- David Hume's theory of causation The entire academic year of 64, while waiting for news of the and had little or nothing to do with 1962-1963 was consumed in trying paper's acceptance or rejection by modern logic, to which I paid scant to figure out the best way to exploit JACM, I concentrated on trying to attention at that time. I did not find this Herbrand-Kanger-Prawitz pro- push the ideas further, and looked out about Putnam's interest in the cess effectively, so as to eliminate for ways of extending the resolu- predicate calculus proof procedure the generation of irrelevant in- tion principle to accommodate even until I read this paper, four years stances in the proof search. Finally, larger inference steps than those after I had left Princeton. It is a in the early summer of 1963, I sanctioned by the original binary very important paper. They managed to devise a clausal logic resolution pattern. One of these showed how, by relatively simple with a single inference scheme, turned out to be particularly attrac- but ingenious algorithmic reorgani- which was a combination of the tive. I gave it the name hyper-resolu- zation, the original naive predicate Herbrand-Kanger-Prawitz process tion, meaning to suggest that it was calculus proof procedure of (for which I proposed the name an inference principle on a level Herbrand could be vastly improved. unification) with Gentzen's 'cut' rule. above resolution. One hyperresolu- In a 1963 paper I wrote about This combination produced a tion was essentially a new inte- my 'combinatorial explosion' expe- rather inhuman but very effective grated whole, a condensation of a rience with programming and run- new inference pattern, for which I deduction consisting of several reso- ning the Davis-Putnam procedure proposed the name resolution. Reso- lutions. The paper describing hy- in Fortran for the IBM 704 at Ar- lution permits the taking of arbi- perresolution was published at gonne [38, pp. 372-383]. Mean- trarily large inference steps which about the same time as the main while, during my second research in general require very consider- resolution paper, and was later re- summer there (1962) an Argonne able computational effort to carry printed in [40, pp. 416-423]. physicist who was interested in and out (and in some cases even to un- It had been my guiding idea in very knowledgable about logic, Wil- derstand and to verify). Most of the this research that bigger and (com- liam Davidon, had drawn my atten- effort is concentrated on the unifi- putationally) better inference pat- tion to the important 1960 paper by cation involved. Preliminary inves- terns might be obtained by some- Dag Prawitz [34], in which I first tigations indicated that resolution- how packaging entire deductions at

44 March 1992/Vo1.35, No,3/COMMUNICATIONSOF THE ACM one level into single inferences at the Emden, Robert Hill. Bernard Meit- and Donald Loveland were devel- next higher level. As I cast about zer had visited Rice University for oping Davis's very closely related for such patterns I came across a two months in early 1965 in order unification-based 'linked conjunct' quite restricted form of resolution-- to study resolution intensively, and method [10, pp. 315-330] in ways I called it 'Pl-resolution'--which I on his return to Edinburgh he set which eventually led Loveland in- found I could prove was just as up one of two seminal research dependently to his Model Elimina- powerful as the original unrestricted groups which were to foster the tion system [28], a linear reasoning binary resolution. The restriction in birth of logic programming (the method entirely similar to the lin- Pl-resolution is that one of the two other being Alain Colmerauer's ear resolution systems developed by premises must be an unconditional group in Marseille). Thus began my the Edinburgh group, and by David clause, that is, a clause in which long and fruitful association with Luckham at Stanford [30]. Back at there are no negative literals (or Edinburgh. By 1970 the resolution Argonne, Larry Wos and George what amounts to the same thing, a boom was in full swing. I recall that Robinson had formed a very strong sentence of the form: 'if antecedent in that year Keith Clark and Jack 'automated deduction' group. They then consequent' whose antecedent Minker were among those attend- broadened the applicability of uni- part is empty). From this restric- ing a NATO Summer School orga- fication by augmenting resolution tion, it follows that every Pl-deduc- nized by Bernard Meltzer and Nic- with further inference rules spe- tion (that is, a deduction in which olas Findler at Menaggio on Lake cialized for equality reasoning (mod- every inference is a Pl-resolution) Como. There we preached the new ulation, paramodulation) which fur- can always be decomposed into a 'resolution movement' for two ther improved the efficiency of combination of what I called 'P2- weeks, and Clark and Minker de- proof searches [43]. Today, the deductions'. A P2-deduction is a cided to join it, soon becoming two Argonne group is still flourishing P 1-deduction which satisfies the notable contributors. and remains a major center of ex- extra restriction that its conclusion, Meanwhile, however, in the U.S., cellence in automated deduction. and all of its premises except one, the reaction was mostly muted, ex- In 1969 there began a series of are unconditional clauses. Thus, ex- cept for isolated pockets of enthusi- noisy but interesting (and, it later actly one conditional clause is in- asm at Argonne, Stanford, Rice and turned out, fruitful) academic skir- volved as an 'external' clause in a a few other places. Bill Miller had mishes between the then somewhat P2-deduction. By ignoring the in- left Argonne to go to Stanford at meagerly funded resolution com- ternal inferences of a P2-deduction the end of 1964, and I accepted his munity and MIT's Artificial Intelli- tree and deeming its conclusion to invitation to spend the summers of gence Laboratory led by Marvin have been directly obtained from its 1965 and 1966 as a visiting re- Minsky and Seymour Papert. The premises, we obtain a single large searcher in his computation group MIT AI Laboratory at that time was inference--a hyperresolution-- at the Stanford Linear Accelerator (it seemed to us) comfortably, if not which is really a multiinference Center. It was at Stanford in the lavishly, supported by the Penta- deduction whose interior details are summer of 1965 that I met John gon's Advanced Research Projects hidden from view inside a sort of McCarthy for the first time. I was Agency (then ARPA, now DARPA). logical black box. astonished to learn that after he The issue was whether it was better had recently read the resolution to represent knowledge computa- Computational Logic: The paper he had written and tested a tionally, for AI purposes, in a de- Resolution Boom complete resolution theorem- clarative or in a procedural form. If it After the publication of the paper proving program in Lisp in a few was the former (as had been origi- in 1965, there began a sustained hours. I was still programming in nally proposed in 1959 by John drive to program resolution-based Fortran, and I was used to taking McCarthy) [31] then it would be the proof procedures as efficiently as days and even weeks for such a predicate calculus, and efficient possible and to see what they could task. In 1965, however, one could proof procedures for it, that would do. In Edinburgh, Bernard Melt- use Lisp easily in only a very few play a central role in AI research. If zer's Computational Logic group and places, and neither Rice University it was the latter (e.g., see [51]), then Donald Michie's Machine Intelligence nor Argonne National Laboratory a computational realization of group had by 1967 attracted many were then among them. knowledge would have to be a sys- young researchers who have since Bertram Raphael, Nils Nilsson, tem of procedures 'heterarchically' become well known and who at that and Cordell Green, at Stanford organized so that each could be in- time worked on various theoretical Research Institute, were building voked by any of the others, and and practical resolution issues: deductive databases for the indeed by itself. These procedures Robert Kowalski, Patrick Hayes, the 'STRIPS' planning software for would be 'agents' that would both late Donald Kuehner, Gordon Plot- their robot, and they were adopting cooperate and compete in collec- kin, Robert Boyer and J Moore, resolution for this (see [36]). At tively accomplishing the various David H.D. Warren, Maarten van New York University, Martin Davis tasks comprising intelligent behav-

COMMUNICATIONS OF THE ACM/March 1992/Vol.35, No.3 45 LOgic Progromming ior and thought. The procedural-logical fight was A Closer Look at Unification Minsky's book, The Society of the really ended, in a delightfully unex- and Resolution Mind [32], elegantly summed up pected way, by Kowalski's inspired What then, is the resolution-based the MIT side of this debate in es- procedural interpretation of the be- clausal predicate calculus, and what chewing polemics to outline a havior of a Horn-clause linear reso- is unification and how does it work? grand unified theory of the struc- lution proof finder, [24]. He ture and function of the mind in pointed out that in view of the be- Clauses the tradition of Freud and Piaget. havior of Horn clause linear-reso- Davis's and Putnam's clauses are The logic side of the debate has lution proof-seeking processes, a quite expressive, despite their ap- been definitively treated in [25], collection of Horn clauses could be parently restricted form. This is which eloquently sets forth the role regarded as knowledge organized reflected in the many different but of logic in the computational orga- both declaratively and procedurally. equivalent ways in which one can nization of knowledge and banishes It suddenly was hard to see what all write them. In dealing with clauses the procedural-declarative dichot- the fuss had been about. Kowalski computationally, however, it is best omy by insight that Horn clauses was led to this reconciliatory princi- to keep them simple and to work (that is, clauses containing at most ple by superb implementation of a with them abstractly. one unnegated literal) can be inter- 'structure-sharing' resolution theo- preted as procedures, and thus can rem prover at Edinburgh [5, pp. A clause can in general be taken be activated and executed by a suit- 101-116], which suddenly com- to be a sentence of the form 'if P ably designed processor. It is this pleted the transformation .of the- then Q', which we will usually write insight that underlies what we now orem-proving from generate-and-test as P~ Q or sometimes the other call logic programming. searching to goal-directed stack-based way round, as Q ~ P. The anteced- computation. When restricted to ent P is a set of conditions and the The never-to-be-implemented Horn clauses, the Boyer-Moore consequent Q is a set of conclusions. but influential 'Planner' system by approach becomes the obvious pre- These conditions and the conclu- Carl Hewitt--his first paper on cursor of the first implementations sions are atomic sentences. The order Planner, in [20]--epitomized the of Prolog. David H.D. Warren's in which the atomic sentences per- MIT procedural approach, while enormously influential later soft- force are presented in written ver- the QA ('Question-Answering') se- ware and hardware refinements sions of clauses and has no logical ries of programs by [31] carried out and advances clearly descend di- significance. There is usually no McCarthy's logical 'Advice Taker' rectly from the Boyer-Moore meth- visible indication of the fact that the approach to AI and convinced odology [50]. antecedent P is a conjunction of its many skeptics that it would really conditions, while the consequent Q work. The work by [18] should now Only the interaction of the Edin- is a disjunction of its conclusions. be seen and appreciated as the ear- burgh group's ideas with the work Those two facts are assumed to liest demonstration of a logic pro- of Colmerauer's Montreal [7] and hold by convention. In discussing gramming system. That paper illus- Marseilles [8] groups was required inferences and manipulations in- trated how to adapt a to open up logic programming and volving clauses, the abstract view of resolution-based proof procedure launch it on its meteoric career. P and Q as sets is both natural and to provide an assertion-and-query The interesting story of this inter- convenient. facility in all essential respects like action was published by [26]. Logic We can then classify a clause that provided by the later Prolog programming is today in excellent along three different dimensions, systems. Unfortunately, the system health. The logic programming depending on whether its atomic was built on the rapidly ramifying community has settled down to sentences contain any variables or full resolution scheme, using unre- enjoy, after two decades of very not, whether or not it has any con- stricted (rather than Horn-) clauses, rapid growth, a steady mature ditions, and whether or not it has so that the program suffered from round of professional conferences any conclusions. A clause with no premature combinatorial explo- and workshops, a plentiful flow of variables is said to be a ground siveness. Nevertheless, it was research and expository publication clause, while if it has one or more largely Green's pioneering work of in books and in its own and other variables, it is called a general clause. [18] that encouraged Kowalski and journals, an exciting marketplace of A general clause is understood to the Edinburgh group to fight off new software and hardware enter- be a universally quantified sentence, the MIT 'procedural-is-best' attack prises, and such majestic long- each of its variables being tacitly by developing the highly efficient range national and international universally quantified with the (LUSH, later called SLD), slowly undertakings as Japan's Fifth Gen- whole sentence as scope. A clause ramifying linear resolution systems eration Project and those spon- with one or more conclusions is said for the restricted case of Horn- sored by the European Commu- to be a positive clause; while one clauses [27, pp. 542-577]. nity. with no conclusions is said to be a

46 March 1992/%1.35, No.3/COMMUNICATIONS OF THE ACM negative clause. Finally, a clause with E(x) D(x z)~ A(F(x z)) and E(u) without any significant internal one or more conditions is said to be D(y u) ---> A(F(u y)) are not variants. syntax of their own. In this discus- a conditional clause; while one with They are, however, separated. sion we will write them as upper- no conditions is said to be an uncon- In unification computations and case identifiers. The arguments are ditional clause. (There is only one in the resolution inference and terms. Noncomposite terms are vari- clause that is both unconditional and proof constructions based on them, ables: x, y, z, ul, and so on. In this negative: it is known as the empty we routinely replace a clause by a discussion we will write variables as clause.) suitably chosen one of its variants-- lower-case identifiers, possibly sub- for example, when we need to en- scripted. Various Ways of Reading a Clause sure that all clauses in a set are sep- Composite terms are like com- Suppose the variables which occur arated. As we shall soon see, how- posite atoms in having two parts: an in the atomic sentences of a clause ever, there are ways of representing operator and list of arguments) In- are V[ . . . Vk, and that its condi- expressions (as two-dimensional deed the common convention for tions are Pl • • - Pm and its conclu- structures of a certain kind) in writing a composite term is similar sions are Ql ... Qn. The various which this becomes irrelevant and to that for writing composite atoms: ways to read and write the clause unnecessary because variables are to write the operator immediately will then depend on the values of k, nameless. The familiar one-dimen- before the list of arguments, as for m, and n, as follows: sional notation, however, is the example:

1 for allV1 ...Vk:ifP~ and...andPmthenQl or ..or Qn (k>O,m~ l,n> ~) 2 for all Vl ... Vk: Q1 or... or Qn ik ~ O, m = O, n > li 3 if Pl and . . , and Pm then Ql or ~ . . or Qn (k ~ O, m -> 1, n > 1)

7 if PI and. , . and Pm then Qt

10 not (P~ and.., and Pro) 11 not true (or: false)

Horn-clauses are cases 5 onward most convenient one for writing PLUS (THREE, SIX) (where n = 1 or n = 0). The clauses expressions, and it is in this repre- SUCCESSOR(SUCCESSOR in cases 5 to 8 are positive Horn- sentation that we have to be careful (SUCCESSOR(ZERO))). clauses (n = 1); those from 9 on- to avoid 'name-clashes' when choos- wards are negative Horn-clauses ing names for variables. The operators are functional con- (n = 0). Cases 2, 4, 6, 8 and 11 are stants. PLUS, SUCCESSOR, and so unconditional clauses (m = 0). The Atoms on. When the argument list of a other cases (m > 1) are conditional Calling atomic sentences 'atoms' term is empty, we usually skip ex- clauses. may run some risk of confusion plicitly writing the empty list, and with Lisp's usage of that word, but write the term as if it consisted of its Variants. Separation of Clauses it is well established. There are two constant alone, as MARY, As we shall soon see, the choice of noncomposite atoms--the truth THOMAS, instead of MARY(), variables in a general clause is values true, false--but in general THOMAS(). Every relational and somewhat arbitrary, and neither atoms are composite expressions, functional constant comes with an the essential syntactic structure nor with two components: a predicate arity, which is a nonnegative inte- the meaning of a clause are affected and a list of arguments. The usual ger, and which is considered to be if we replace some or all of its vari- convention for writing a composite part of the constant's identity. A ables by other variables. The only atom is to write its predicate imme- constant having arity n is said to be proviso is that the correspondence diately before its argument list, as n-ary. Thus MARY is 0-ary, SUC- between old and new variables must for example: CESSOR is 1-ary, GREATER- be one-to-one. Two clauses which THAN is 2-ary, and so on. The differ from each other only in this basic formation rule for composite MOTHER(MARY, THOMAS) way are called variants of each expressions (atoms or terms) is that GREATER-THAN(SUM-OF other. If two clauses have no vari- an n-ary constant must always be (THREE, SIX), SEVEN) ables in common, they are said to be separated or standardized apart. Thus The predicates are relational con- qn writing a list, we may place a comma after E(x) D(x y)~ A(F(x y)) and E(u) each item (other than the last) to enhance the stants MOTHER, GREATER- readability. This is, however, optional, and is D(u y)--> A(F(u y)) are variants; THAN, and so on: just identifiers, not part of the definition of a list.

COMMUNICATIONS OF THE ACM/March I992/Vol.35, No.3 47 LOgiC Programming followed immediately by a list of n Unification 'specialization' substitution g = arguments (except, as noted above, Let S be a set of expressions. When {u=P,v=Qy=D}.Thisisade- when n = 0, when the list can be by a substitution 0 transforms every fining characteristic of mgus. convention omitted). The common expression in S into the same ex- In fact, to say that ~r is an mgu of underlying semantic idea is that of pression, 0 is said to unify S (or to be a set S is to make the following two an applicative expression which repre- a unifier of S) and the set S is said to statements: (1) that ~ unifies S and sents the result of applying a func- be unifiable. (2) that any unifier A of S whatso- tion or relation to a suitable tuple of For example, let 0 be {x -- H(P ever satisfies the condition: A = arguments. Q), y = D, u = P, v = Q, z = G(H(P ~'/~, for some /~. Q),D)}. When we apply 0 to the two In the clausal predicate calculus, A unifiable set always has an expressions F(x G(x y)) and F(H(u clauses are the only kind of sen- mgu. Moreover, there are simple v) z) both of them become the same tence available in which to express algorithms (unification algorithms; expression, namely F( H(P Q) the premises and desired conclu- about which we shall say more later) G(H(P Q) D)). Thus 0 unifies the set sion of a proof problem. This is not which compute an mgu for any finite {F(x G(x y)), F(H(u v) z)}. as limiting as it sounds. It is in fact unifiable set, and detect the non- This set, however, is also unified possible to translate (automatically) unifiability of a set which is not uni- by the substitution tr = {x = H(u v), a proof problem from the full fiable. These algorithms are best z = G( H(u v) y)}, which transforms predicate calculus into the clausal stated for the more general case in both its members into the same predicate calculus. Detailed discus- which we seek a substitution that expression: F(H(u v) G(H(u v) y)). sions of how to do this can be unifies several disjoint finite sets of This expression is not only a more found, in [ 12]. expressions simultaneously (or, as general common instance of F(x we shall say, which unifies a partition G(x y)) and F(H(u v) z), but is actu- of a set of expressions). It is the uni- ally a most general common in- Substitution fication of partitions that we shall stance, and so cr represents the most Making the clausal predicate calcu- be concerned with in the remainder general way in which the set {F(x lus more machine-oriented calls for of the discussion. The idea is virtu- G(x y)), F(H(u v) z)} can be unified. a much closer analysis of the idea of ally the same as that of unifying a We therefore say it is a most general instantiation. When an expression B single set: a substitution 0 unifies a ('mgu') of {F(x G(x y)), F(H(u can be obtained from another ex- unifier partition T = {Sl ..... Sk} of a set S v) z)}. All other common instances pression A by substituting terms for of expressions if each of the sets of F(x G(x y)) and F(H(u v) z) are some or all of the variables in A, B $10 ..... Sk0 is a singleton. A sub- instances of the above most general is said to be an instance of A. stitution ~r most generally unifies T one. In this particular case we have: For example, F( H(y z) G( H(y z ) if (1) cr unifies T and (2) for every F(H(u v) z)0 = F(x G(x y))0 = F(H(P A(y)) A(y)) is an instance of F(x G(x unifier A of T we have A = cr-g for Q) G(H(P Q) D)) = (F(x G(x y))cr)g y) y). Inspection confirms that F( some g. = F(H(u v) G(H(u v) y)/x where/x = H(y z) G(H(y z ) A(y)) A(y)) can be We need to be able to compute a {u = P, v = Q, y = D}. This suggests obtained from F(x G(x y) y) by si- most general unifier efficiently, for that 0 is some kind of 'product' of multaneously replacing each occur- any partition as input. There is now the mgu ~ and the substitution g. rence of x and y by an occurrence a rather large specialized literature We can write 0 explicitly as 0 = cr./~ of H(y z) and an occurrence of A(y) on this topic, but for our present and we find that, indeed, this no- respectively. It is very interesting purposes we need not be concerned tion of the product of two substitu- that this basic logical operation of with many of the details. tions can be naturally defined and substitution is essentially a parallel is extremely useful. one. unification AlgOrithms The product a'/3 of two substitu- The computation of a most general We can represent specific substi- tions a and/3 is the overall substitu- unifier, when expressed in its most tutions by sets of equations. For tion which results from first per- simple and natural form, is a highly example, the preceding substitu- forming a and then performing/3. parallel one. It was not at first seen tion can be represented by the set Thus we have E(a'/3) = (Ea)/3 for all to be so. The natural, inherent par- {x = H(y z), y = A(y) }. Unspecified expressions E. This product opera- allelism is most clearly seen if we substitutions are usually denoted by tion is associative, and has an iden- think of expressions as being really lower-case Greek letters: 0, A, g, ~, tity, namely the 'empty' substitution directed labelled graphs, as follows: and the result of applying a substi- E which leaves every variable un- tution to an expression E is indi- changed. However, it is not in gen- • a variable is a graph with only one cated by writing E0, EA. Therefore, eral commutative. node, its root, which is unla- if E is F(x G)(x y) y) and 0 is {x = It is no accident that in our ex- belled. H(y z), y = A(y)}, E0 is F( H(y z) G( ample we can express the unifier 0 • a constant K is a graph with only H(y z ) A(y)) A(y)). as the product of the mgu ~r and the one node, its root, which is la-

48 March 1992/%1.35, No.3/COMMUNICATIONS OF THE ACM belled by the symbol K. • an applicative expression K(EI, .... E,) is a graph whose root is unlabelled and has n + 1 out-arcs which are labelled respectively by the integers 0 to n. The out-arc labelled by 0 points to the node which is the constant K. For i = 1, .... n, the out-arc labelled i ; \ // points to (the root of the graph which is) the term El. If an out-arc goes from N to M and ?xh/ /e is labelled by j, we say that M is ajth t i ¢¢/ immediate successor of N. The arity of )@@¢ ®©® a node is the largest integer which C labels any of its out-arcs. So, for G H K example, the expressions R(P G(x y) x y) and R(y z H(u K) u) are the two roots (nodes 1 and 2) of the FIGURE 1. graph in Figure 1, node 12 is a 2d immediate successor of node 10, and the arity of node 5 is 2. In all there are 13 expressions in the graph, one for each node. The graph itself can be thought of as representing the set of these expres- sions. Note that in the graphical form of ' i '\ \R / / I" expressions we need no names for variables. Distinct variables are sim- ply distinct unlabelled leaves (here, they are nodes 6, 7, 9 and 13, whose names in the linearly written ex- pressions are respectively z, x, y and i u). The use of the graphical form of expressions thus avoids the well- C known complication of needing to G H K rename variables in order to pre- vent unwanted identifications of FIGURE 2. two distinct variables which happen to have been given the same name. Once we are given a set S of atoms and terms as a graph, we can represent a partition P of S by insert- A B C D ing one or more links (undirected arcs) between roots of distinct ex- pressions which are in the same part of P. For example, by inserting E F G a link between nodes 1 and 2 of the graph in Figure 1 we represent the 12-part partition

P = {{R(P G(x y) x y), R(y z H(u K) H J K L M u)}, {O(x, y)}, {H(u, K)}, {R}, {P}, {O}, {H}, {K}, {x}, {y}, {z}, {u}} X Y Z by the graph in Figure 2. FIGURE 3.

COMMUNICATIONS OF THE ACM/March 1992/Vo1.35, No.3 49 LOgiC Programming

If a part of a partition has more than two members, we do not need to put links between every two nodes in it. A part is represented by a clus- ter of nodes--a maximal set of nodes any two of which are con- nected by a path of such links. For example, the six-part parti- tion {{A, B, C, D}, {E, F, G}, {H, J, K, L, M}, {X}, {Y}, {Z}} of the set {A, B, C, D, E, F, G, H,J, K, L, M, X, Y, Z} is represented by the graph in Fig- ure 3.

Given a partition in the form of a graph, the problem to find an mgu of the partition (or to detect its nonunifiability) is solved by the fol- lowing unification algorithm: while there are clusters in the FIGURE 4. graph but no clashes do shrink the graph. Shrinking a graph requires two steps: Step 1. Each cluster C in the graph is "collapsed" into a single new node, which inherits all of the in-arcs, out-arcs, and labels of every node in C. Step 2. New links are inserted between nodes which are equated by step 1. Two nodes are equated if they are both jth successors, for some j, of the same node. A clash is a cluster in which there are nonvariable nodes which either (1) are labelled by dis- tinct constants, or (2) are unla- G H K belled, but have different arities. Each iteration of the loop trans- forms a graph into another graph, FIGURE S. which also in general contains links. For example, the first iteration fewer nodes than the previous ing each such variable with the ex- transforms the graph in Figure 2 graph. If, after termination, the pression represented by the corre- into the graph in Figure 4. The graph contains no clashes and is sponding node in the terminal second iteration then trans- acyclic, the original partition is uni- graph. forms this into the graph in Figure fiable. Otherwise, not. Note that the nonterminal 5 which is terminal, since there are On termination, an mgu for a graphs generated during the pro- now no links. unifiable partition can be found by cess do not represent sets of expres- The process in general continues comparing the terminal graph with sions, since some of their nodes until an iteration either creates no the initial graph. For each node have more than one jth successor, new links, or else creates a clash; representing a variable in the origi- for one or more j. whereupon it terminates. This must nal graph, we find the node in the The graph-shrinking parallel eventually happen, since each itera- terminal graph which contains it. unification algorithm is presented tion produces a new graph with The mgu is represented by equat- here in essentially the version that

SO March 1992/Voh35, No.3/COMMUNICATIONS OF THE ACM hi ¸!!

was recently developed, analyzed case), we are ready to make infer- s)} and N = {P(x y u),P(x v w)}, since and efficiently implemented in [2]. ences by resolution. {M tO N} is unifiable with mgu {x = The elegant data-parallel SIMD The fundamental resolution in- G(rs),y=v=r,u=w=s}. implementation for the Connection ference pattern is closely related to Machine exoloits all the inherent what logicians call the 'cut' infer- Example 3. From P(x y u)P(y z parallelism in the process very ef- ence. (In Prolog programming par- v)P(x v w)~ P(u z w) and P(a b fectively. lance, unfortunately, the word 'cut' c)P(b d e)P(c d f)~P(a e f) we The sequential version of this has come to have another, quite dif- infer P(x y a)P(y b v)P(x v c)P(b d "fast unification" algorithm was hit ferent, meaning). Cut inferences e)P(c d f) ~ P(a e f) by a resolution upon independently by [4, 22, 42], have the form: in which M = {P(u z w)} and N = improving an earlier formulation {P(a b c)}, since {M U N} is unifiable from A~(B+{L})and by [3]. As far as I know, the first with mgu{u=a,z=b,w=c}. ({L} + C) ~ D version of a unification algorithm infer (A U C)--~ (B tO D). to be explicitly stated and accompa- From two given clauses, only a fi- nied by correctness and termina- nite number of clauses can be in- We can make a cut inference from tion proofs was in [39]. ferred by resolution--one for each two clauses if any only if there is Later, in [41], I formulated a choice of the 'cut' sets M and N for some atom L which is in the ante- more efficient version of the algo- which the partition {M U N} is uni- cedent of one clause and the conse- rithm, using a tabular representa- fiable. If there are no such choices quent of the other. To form the tion of the graph-representation to of M and N, then nothing can be conclusion of the inference, we first gain some of the same computa- inferred from the two clauses by 'cut' out L from both places, and tional advantages which were bril- resolution. then merge the two antecedents liantly orchestrated on a much into one and two consequents into larger scale by [5] in their impor- one. The 'disjoint union' notation ReSolution Deductions and Proofs tant structure-sharing resolution the- X+Y denotes the union XUY, A resolution deduction is a finite tree orem-prover. This tabular repre- but also carries the further infor- whose nodes are labeled by clauses, sentation [41] is also the point of mation that X n Y = O. each nonleaf node being labeled by departure for [2]. a clause which is inferred by a reso- Herbrand's original (1930) ver- lution inference from the clauses Example 1. From the clauses A sion of the unification process is labeling its immediate successors. B~C D and D E~F G we can stated briefly, informally, and with- The conclusion of the deduction is infer the clause A B E ~ C F G by a out proof (see [19]). the clause labeling its root, and the cut, eliminating the atom D. In 1984 [13] pointed out that in premises of the deduction are the certain cases there is no opportu- The resolution inference pattern clauses labeling its leaves. A resolu- nity for the parallel graph-shrink- generalizes the cut inference pat- tion proof is a resolution deduction ing algorithm to achieve any signifi- tern by bringing in unification. The whose conclusion is false (= the cant speed-up. Thus, for example, resolution inference pattern has the empty clause). Such a proof estab- in finding the mgu {x = A} of the form: lishes that the premises are contra- set dictory (unsatisfiable). If S is any from A ~ (B + M) and unsatisfiable set of clauses there is {F(F(F(F(F(F(F(F(x)))))))), (N + C) ~ D always a resolution proof whose F(F(F(F(F(F(F(F(A))))))))} infer (A U C)¢r---> (B U D)cr premises are all in S. This fact is the where ~r is an mgu of the one- we can merge only one pair of completeness of resolution (see part partition {M U N}. nodes, and generate only one new [39]. link, at each iteration of the loop. A resolution proof with n + 1 In making a resolution inference, These successive minimal modifica- premises can be taken in n + 1 dif- we must first use unification to de- tions of the graph therefore com- ferent ways as a proof of the nega- duce a pair of instances of the two prise essentially a sequential pro- tion of one of its premises from the premises suitable for a cut to be cess. However, such 'worst cases' other n premises. For example, a applied. In the special case that are more pathological than typical, resolution proof with premises A, M = N = {L}, the mgu of the parti- and experience suggests that they B, C can be taken as (1) a proof of tion {M U N} is the identity substi- are rarely met in real applications. not-A from the premises B and C, tution. So in this case, a resolution is (2) a proof of not-B from the prem- the same as a cut. Resolution ises A and C, and (3) a proof of Once we can compute an mgu for Example 2. From -->P(G(r s) r s) not-C from the premises A and B. any unifiable partition of a set of and P(x y u)P(y z v)P(x v w) --> P(u z expressions (or show the partition w) we infer P(r z r)--> P(s z s) by a P1-ReSOlution not to be unifiable, if that is the resolution in which M = {P(G(r s) r A resolution one of whose two premises

COMMUNICATIONS OF THE ACM/March 1992/Vol.35, No.3 Sl LOgiC Programming

is unconditional is called a Pl-resolu- Hvperresolution Deductions nite. This is one reason it is so tion. Example 2 is a Pl-resohtion, A hyperresolution deduction is a fi- difficult to make an efficient proof but Example 3 is not. It turns out nite tree each of whose nodes has a procedure for traditional predicate that whenever a set of clauses is label and each of whose nonleaf calculi. For example, most tradi- unsatisfiable, then there is a PI- nodes also has a justification. The tional predicate calculi contain the resolution proof from those prem- labels are unconditional clauses, and rule of specialization: ises (see [40]). In other words, P1- the justifications are conditional from VA infer V(A0), resolution is also complete: despite clauses. The clause labeling a non- where 0 is any substitution. its restricted form, Pl-resohtion is leaf node N is inferred by a hyper- just as strong as resolution, but its resolution whose unconditional (The sentence VS is the universal proof-space is sparser than that of premises are the clauses labeling closure of the sentence S: the result unrestricted resolution. the immediate successors of N, and of prefixing a universal quantifier whose conditional premise is the to S for every free variable in S). Hyper-Resolution justification of N. The conclusion of With this inference available, there We get an even sparser proof-space the deduction is the clause labeling are infinitely many deductions of when we take as the only inference its root. The premises of the deduc- size 2 which have the same premise rule, instead of the two-premise P1- tion are the labels of its leaf nodes VA--one for each different substi- resolution, the (p + 1)-premise and the justifications of its nonleaf tution 0. hyper-resolution rule in which exactly nodes. one of the premises is a conditional Hyperresohtion deductions can Hyperres01ution and clause and all of the other p prem- yield only unconditional clauses. Horn Clause Logic ises, together with the conclusion, are Moreover, they can yield only posi- The advantages of hyperresohtion unconditional clauses. The hyper- tive unconditional clauses, unless the are quite striking in the Horn resolution rule is: justification of the root node is a nega- clause predicate calculus. In this subsystem of the clausal predicate calculus every clause is a Horn from --~(Cx + M1) ..... clause, namely, a clause having at (Cp + Mp), and Nl + "'" + Np--~ D most one conclusion. Hyperresohtion infer "-*(C1 U'" U Cp LI D)o, then becomes much simpler. Recall where o, is an mgu of the p-part the general definition of hyper- partition {Mt U N1 ..... Mp U Np}. resolution:

The unifiable p-part partition that tive conditional clause and in that is the essential ingredient of a hy- case, but only in that case, the con- perresohtion is called its kernel. clusion is a negative unconditional The p + 1 premises and the kernel clause; indeed, it is the empty where ~ is an mgfi 6fthe p;part together uniquely determine the clause. Thus a hyperresohtion pa~iti~n conclusion. deduction of the empty clause (a When all clauses are restricted to hyperresohtion proof) always has having at most one conclusion, the A hyperresolution inference is exactly one negative conditional clause 'cut' sets Mi can only be singletons really a compacted reorganization among its justifications. As we shall (say, {Ai}), and the 'remainder' sets of a Pl-resohtion deduction whose see, it is this feature which adum- Ci must be empty. Consequently, conclusion is unconditional. After brates logic programming. the definition of hyperresohtions the reorganization the deduction Completeness and Local for Horn clauses can be restated, in has had all of its interior nodes sup- Finiteness of the Resolution the following much simpler form: pressed and has become a single Clausal Predicate Calculi from ; : : integrated transaction instead of a The resolution and hyperresoh- linked system of many transactions. tion versions of the clausal predi- i i infer ~D~: By reorganizing the reasoning as a cate calculus are all complete. Also, where is single inference, we are simply re- both systems are locally finite. This garding its conclusion as having means that, in each system, there been obtained directly (or, to use a are only finitely many deductions In this restatement of the rule, D traditional logic expression, immedi- of a given size (number of nodes) and the A's and B's are all atomic ately-without any 'mediation') having a given set of premises (and sentences. When we combine hy- from its premises in one step, this number is much smaller for perresohtion inferences into mul- rather than 'mediately' as the even- hyperresolution than for resolu- tiinference deductions, we are in tual outcome of several linked P1- tion). By contrast, traditional predi- effect treating each particular ap- resolution steps. cate calculi are not even locally fi- plication of this inference pattern

S6 March 1992/%I.35, No.3/COMMUNIfATIONSOFTHEA@M as though it were a special infer- has the same premise and the same is a cover of A(x0) B(x0) ~ C(x0) by ence rule, 'the {Bl ..... Bp} ~ D conclusion as D, and conversely. We C, in view of the assignments given inference rule', stated as: define ultraresolution inferences by the table: atom assigned to node A(x0) 2 B(x0) 3 H(G(x2)) 4 This is, however, just a pragmatic directly, however, without refer- D(xl yl) 5 device to sharpen our understand- ence to their corresponding hyper- E(xl) 6 ing of the very special role that con- resolution deductions. and has the following partition as ditional Horn clauses play in logic The ultraresolution rule is its kernel: programming. (where A ~ B is a Horn-clause and C is a set of Horn-clauses): Ultraresolutions: Horn Clause Hyperresolution Deductions as !~!~!~iiii!~i!i~i~!i~!~iii~!!ii!i~!!i!i!!!!ii~!!!i~i!i~i~i~!i!~i~!~!~iii!!iii~!!iiii~!~ii!!!ii~!i~ii~!!~iii~!!!iii~iiiii~!iii~!!iiii~iii!!!i~i~!ii~iiiii!~!~!!i~iii!~i!!~iiiii Single Inferences We again apply the idea of making a single inference out of an entire !i~ii~i!!i!iii~iiiii~!iiiiiii!iii~!i¸ii!!Ji~!iiii!ii!!!ci!JJii~iii!!~i!!~il ¸i~i~ili!i~iiiii~ui!i!~!i!i The clause A --~ B is the main prem- deduction. In the case of hyper- {{A(x0), A(F(xl yl))}, {B(x0, B(x2)}, resolution, instead of thinking of /se and the clauses in C are the cov- {E(x0, E(M)}, {D(xl yl), D(M N)}, the conclusion of an entire deduc- ering premises. {H(G(x2)), H(G(x3))}. tion (namely a deduction built from Covers and Their Kernels Since this kernel is unifiable, with Pl-resolution steps and having an A cover of a clause A ~ B by a set C mgu unconditional conclusion) as being of clauses is a certain kind of finite arrived at stepwise by the perfor- tree with nodes labeled by clauses. cr = {x0 = x2 = x3 = F(M N)), mance of each of its inferences sep- The root of the tree is labeled by Xl = M, yl = N}, arately, we think of the whole con- A~ B, while the other nodes are we can infer the clause struction as one inference step labeled by variants of clauses in C. involving a higher and larger-scale The extra condition that makes the ---~C(x0)~r = --*C(F(M N)) inference pattern. We will now treat tree a cover is that for each node N by an ultraresolution which has Horn clause hyperresolution de- in the tree, every atom in the ante- A(x0) B(x0)---~C(x0) as its main ductions in a similar way, and cedent of the clause labeling N is premise and C as its set of covering thereby arrive at a higher- and assigned to a distinct immediate premises. larger-scale inference pattern successor of N. The kernel of the which we call ultraresolution. cover is the partition: The intuition behind the notion There is really no need, prag- {{X, Y}IY is the conclusion of the of a cover of a clause A ~ B is that matically, to know the conclusion of clause labelling the node to which X it depicts exactly the pattern of orga- every individual inference in a hy- is assigned}. nization of the given clauses. If the perresolution deduction, if all that Example 4. To illustrate the no- kernel of the cover is unifiable with we are after is the eventual conclu- mgu ~r, it guarantees that we can tions of a cover and its kernel, con- sion of the whole deduction. We easily relabel the tree so it turns into sider the clause: can instead characterize that even- a hyperresolution deduction, from tual conclusion more directly, by a A(x0) B(x0) ~ C(x0) these clauses as premises, of the relationship based only on the and the set C of clauses same unconditional clause ---~B~ structure of the premises of the that the ultraresolution inference {E(xl) D(Xl yl)~ A(F(xl y])), deduction. By omitting in this way obtains directly from them in one H(G(x2)) ----) B(x2), all of the interior stepwise conclu- step. In this relabeling, the new ~H(G(x3)), ~ D(M N), ~ E(M)}. sions we turn the entire hyper- label on each leaf node of the tree is resolution deduction into a single in- The labeled tree given by the table: the same as the old label. The old ference, which immediately yields its conclusion from the premises in one integrated step.

Ultraresolution i~iii~!iii~!i3i!!!!~!~!~i~!~i~!ii!~!!~i~!~!~!~i~i!~!i~!!~!~!!~ill!!~!~!i~i~i~i~ii~ii~i~i~iii!~!!i~i~i~ii~ii!~i~ii~i!i~i!~i~i!!~i~i!!~i~i~ii~i!!i~!ii~!!!!!~i!~ To every hyperresolution deduc- ~ i~i~ ~ill ¸ ii~!i! ill iii i~iiiii!ill~i~iiiiiiiiiiiiiiiiii!i~!!ili~H~i~i~i~i ' ii¸ i~ ii ii ii~i! !i!ill! iii !!: !!i !! iill!i!~iii!i~i ! i~ iiiii~i!~ tion D there corresponds an ~!ii!~ii~iiii~!!iii~!i~iii~iii~ii~i~i!i!iii!!i~i~iii!i~i~iiii~iiii~!~i!i!!~i!~iMi~ii~ii~~i~ ii~ i~ ¸~i~i!~i!i~!!,i! ~iii~iii~ii~iii!ili! ii i!i ~iiiii ii! ii~!i i!i~i ultraresolution inference U which ii ii!i i,ii !iii !ii! ii ¸ii!ii!i ii! !!iii ii! ¸ ¸ !i ii! iil!ii!!! !ii!i!ii!ii!! ¸ii! iii!!i!ii!!!iiii !!!!ii ii!!ii ,ii i!!!!iii

COMMUNICATIONS OF THE ACM/March 1992/Vo1.35, No.3 S7 LOgiC Programming label on each nonleaf node of the cover, however, is removed (it now 41 (2) becomes the justification of the hy- perresolution inference at that same node), and the node's new label is the unconditional clause which is inferred by a hyperresolu- tion from this justification clause together with the new labels on the 37 immediate successors of the node. The following example illustrates the relationship between a hyper- resolution deduction and the corre- sponding ultraresolution inference. 40 Example 5. Figure 6 is a hyper- resolution deduction of the uncon- ditional clause UNCLE(TED 31 ANN)*- from a subset of the fol- lowing set of Horn clauses, which comprises a small 'family relation- ship' knowledge base. This knowl- edge base contains (as its 'defini- tions') the following conditional Horn clauses: FIGURE 6.

1 UNCLE(u x) *--BROTHER(u y) PARENT(y x) 2 UNCLE(u x) *--HUSBAND(u s) SISTER(s p) PARENT(p x) 3 PARENT(x,y) ~--CHILD(y,x) 4 BROTHER(b x) *--SIBLING(b x) MALE(b) 5 SISTER(s x) ~--SIBLING(s x) FEMALE(s) 6 SIBLING(x y) ~---DIFFERENT(x y) FATHER(f x) FATHER(f y) MOTHER(m x) MOTHER(m y) 7 HUSBAND(h w) *--MARRIED(h w) MALE(h) 8 WIFE(w h) ~---MARRIED(h w) FEMALE (w) 9 FATHER(f x) ~---PARENT(f x) MALE(f) 10 MOTHER(m x) ~---PARENT(m x) FEMALE(m) and (as its 'facts') the following un- conditional Horn-clauses: 11 CHILD(JIM JOE)*- 15 CHILD(JOE TOM)<-- 19 CHILD(TOD PAT)<-- 12 CHILD(JOE MEG)~--- 16 CHILD(ANN SUE)<--- 20 CHILD(RON PAT)<-- 13 CHILD(JIM SUE)*- 17 CHILD(PAT MEG)<-- 21 CHILD(TOD TED)<-- 14 CHILD(ANN JOE)~--- 18 CHILD(PAT TOM)<-- 22 CHILD(RON TED)<--- 23 MALE(JIM)~--- 29 FEMALE(ANN)<-- 35 MARRIED(TOM MEG)<--- 24 MALE(JOE)~--- 30 FEMALE(SUE)<-- 36 MARRIED(JOE SUE)<-- 25 MALE(TOM)*-- 31 FEMALE(MEG)<-- 37 MARRIED(TED PAT)<-- 26 MALE(TED)~--- 32 FEMALE(PAT)<--- 38 MARRIED(RON SAL)<--- 27 MALE(TOD)~--- 33 FEMALE(SAL)<-- 39 MARRIED(JIM JAN)<-- 40 DIFFERENT(a b)*-- a # b & a, b, E {JIM, JOE, TOM, TED, TOD, RON, ANN, SUE, MEG, PAT, SAL, JAN}

Premise 40 is a 'virtual' definition: it is simply a shorthand way of sup- plying 132 facts (such as DIF- FERENT(JOE ANN)*--) whose predicate is DIFFERENT and whose two arguments are distinct constants in the displayed set.

S8 March 1992/%1.35, No.3/COMMUNICATiON6 OF THE ACM From this knowledge base there are, for example, hyperresolution deductions of each of the following unconditional clauses: 41 UNCLE(TED ANN)*- 45 PARENT(TOM PAT)*-- 49 FATHER(TOM PAT)*--- 53 HUSBAND(TED PAT) 42 UNCLE(TED JIM)*- 46 PARENT(TOM JOE)*- 50 FATHER(TOM JOE)*- 54 SISTER(PAT JOE)*- 43 UNCLE(JOE TOD)*- 47 PARENT(MET PAT)*- 51 MOTHER(MEG PAT)*- 55 SIBLING(PAT JOE)*- 44 UNCLE(JOE RON)*- 48 PARENT(MEG JOE)~--- 52 MOTHER(MEG JOE)*- 56 PARENT(JOE ANN)*- For example, clause 41, UNCLE (TED ANN)*-, is the conclusion of the hyperresolution deduction shown in Figure 6. The label on each node is given in the diagram by its number next to the node, and at each nonleaf node is followed by the number, in parentheses, of the clause which is the justification of the node.

The cover of the ultraresolution inference corresponding to this hyperresolution deduction is shown in Figure 7.

Figure 9 displays the cover of this 31 ultraresolution inference in more detail, and shows more clearly that its status as an inference is concep- tually independent of the corre- sponding hyperresolution deduc- tion. In Figure 9, each labeled node of the cover is represented by a box FIGURE 7. of one of the three types shown in Figure 8. These represent a node labeled respectively by a positive conditional clause Q*- P1 • . • Pn, ...... Q by a negative conditional clause Pa * * * Pn P1 • • • Pn •-P] • - • Pn, and by a positive un- t ...... Q ...... conditional clause Q*-. conditional conditional unconditional positive clause negative clause positive clause The thick lines in Figure 9 show the pairs of the unifiable kernel of the FIGURE 8. cover. and applying cr to the conclusion of negative conditional clause. By tak- That this kernel/s unifiable is veri- the root clause yields UNCLE(TED ing a negative clause not-Q as the fied by an easy computation. Its ANN)*-. premise together with a collection mgu ~ is: of variants of clauses from K, we Queries and Their Answers may be able to infer false by an {a0=ul =h2=TED, Logic Programming ultraresolution. That is, the set b0=xl =y8=ANN, We can consider any collection K of {notQ} U K may well be inconsist- sl =w2=s3=x4=x5 positive Horn clauses as a knowl- ent and its inconsistency demon- = y7=x9=yll =PAT, edge base. A set of positive Horn strated by our inference. We then pl =x3=y4=x6=x8 clauses is necessarily consistent: one can turn this inconsistency to our = xl0 = y12 = y13 = JOE, cannot deduce false from it by hy- advantage, by regarding not-Q as f4= f5= f6=x7=xl3=TOM, perresolution (or what is the same, the negation of a query Q that we m4 -- m9 = ml0 one cannot infer false from it by an want answered, and digging out the = xll = x12 = MEG}. ultraresolution) if it contains no answer from the details of the

COMMUNICATIONS OF THE ACM/March 1992/Vol.35, No.3 S9 Logic Programming

t HUSBANDIu'ISl) ...... ~I~EL~is~lp~i ...... PARENT'(plxl)

; ...... i ...... i....

~ DIFFERENT(x4 y4~ FATHER(f4 x4} .....FAT "HE'R i;]';4;" MOTHER(rn4 x4) MOTHER(m4 y 4 ) / I I I I "1 I "1 ! "1 I

.... I ...... i I , ...... , l I , ...... , I I, ......

~ DIFFERENT(PAT JOE) ]

FIGURE 9. ultraresolution. general, have different covers and function, for Horn clauses. It is not Suppose that not-Q is the nega- kernels, and will therefore provide clear to me why [1] felt this name tive clause *--{Gl ..... Gn}. Recall different answers from K to the same was unsuitable. Whatever we call it, that we can read ~--{G1 ..... Gn} as: query Q. this highly specialized and narrowly To find all these answers, what is restricted resolution inference has not 3xl... 3Xm GI and.., and Gn needed is a suitable way of finding the form2: where xl • • • Xm are all of the vari- all the different ultraresolution in- from A~B andG~H ables occurring in the atoms G1, ferences of false whose main prem- infer (A U ~G)cr--~ Ho, .... Gn. Then Q is ise is the negative clause not-Q and if cr is a most general unifier whose covering premises are vari- 3xl ... :lXm Gl and ... and Gn of the 1-part partition ants of clauses in K. and so an inference of false from {{B, I'G}}. {not-Q} u K is an inference of Q LUSH, AliaS SLD, ReSOlution The clause G ~ H is the main prem- from K. The usefulness of this fact The original Edinburgh solution to ise of the inference, and the clause for logic programming is that the this tricky computational problem A ~ B is the side premise. mgu cr of the kernel of the infer- was simple and beautiful, and it led The novel feature of this infer- ence can be used to supply directly directly to Prolog. After much ex- ence rule is its use of the two func- the 'answer' (x 1 ... Xm)= (X 1 ... ploration, [27] devised a 'linear' tions, selection (1') and remainder ($), Xm)Cr tO the 'query' 3Xl . . . 3Xm GI binary resolution inference pattern both of which operate on the set G and . . . and Gn. which they called SL-resolution (for of conditions of the main premise. There may be many different Selective Linear resolution). When The function 1' yields the condition ultraresolution inferences of false restricted to Horn clauses, SL reso- which is selected, while the function which have the same negative lution becomes--as [1] named it-- clause as the main premise and SLD-resolution (for Selective Lin- 2Actually, in the original version and the ver- sion contained in the logic programming liter- whose covering premises are taken ear resolution for Definite clauses). ature, the conclusion H of the main premiss is from the same knowledge base K. It A definite clause is (simply another omitted, and thus the main premiss is always a is even possible that the covering name for) a positive Horn clause. negative Horn-clause. Here, for various rea- sons, one of which will shortly become evi- premises also are the same, with However, [21] had already, in 1974, dent, we permit the main premiss to have a only the underlying cover and ker- coined a more whimsical name for conclusion. In addition to its role as the 'an- nel being different. In any case, swer template' in logic programming compu- it: LUSH resolution, for Linear res- tations, the conclusion can be put to other these different inferences will, in olution with Unrestricted Selection good uses.

0 March 1992/Vol.35,No.3/COMMUNICATIONS OF THE ACM yields the set of conditions which are not state Qt by using it as the main responds to the fact that the four selected. Thus at most one LUSH/SLD premise, and a variant of one of the unconditional clauses UN- inference is possible from a given clauses from the knowledge base CLE(TED ANN)*--, UNCLE(TED main premise and side premise, (or 'program') as the side premise, JIM)*-, UNCLE(JOE TOD)*- and and its conclusion/s unique to those to make a LUSH/SLD inference UNCLE(JOE RON)*-- can all be two premises. Hence a LUSH/SLD whose conclusion is the (t + 1) st deduced by hyperresolution, or, deduction will necessarily have a state Qt+ 1. A state is terminal if it is equivalently, inferred directly by an linear structure, in which each suc- an unconditional clause. ultraresolution, from the knowl- cessive LUSH/SLD resolution will Thus each complete computation edge base. have for its main premise the con- is a LUSH/SLD resolution proof What is so beautiful about this clusion of the previous one. of an unconditional clause: Edinburgh scheme is that it turns The really interesting and useful, ANSWER(tl ... tm)<---, thereby out that the branches of the LUSH/ and at first acquaintance amazing providing the computation with the SLD computation tree correspond, property of LUSH/SLD resolution answer (tl . . . tm) as its output. The one-to-one, to all the different is that the choice of the selection and different possible computations are ultraresolution inferences whose remainder functions is completely unre- related as the branches of a tree-- main premise is the initial state of stricted (whence the 'U' in the name the LUSH/SLD computation tree-- the computation. The entire tree of 'LUSH'--more's the pity that the since after any step there is in gen- LUSH/SLD computations is thus a name 'SLD' lacks any acronymic eral more than one choice of positive complete survey of all possible reference to this feature). Thus, in clause to take as the side premise for the ultraresolution inferences from that particular, the selection and re- next step. Each nonterminal state of premise and the given knowledge mainder functions can be chosen so the computation will in general, base. as to make the sets of conditions in therefore, have more than one suc- the successive main premises be- cessor state. It is the complete tree of This correspondence now makes have like a stack, provided we take all possible computations for the it obvious why the selection/remain- seriously the order in which the given query which is the total 'inter- der functions are unrestricted• conditions are written, and always nal' response of the logic program- Once we see clearly that each form the conclusion by adjoining ming engine to that query; but its LUSH/SLD proof is simply a node- the new conditions, if any, on the 'external' response is simply (some by-node 'top down' or 'backward- left of the remainder, in their written representation of) the set of all an- chaining' construction of the cover order. The selection then yields the swers to the query. of an ultraresolution inference, leftmost condition (the one at the starting with the antecedent of its 'top' of the 'stack'). A LUSH/SLD Example 5 (continued)• The family main premise, we can interpret deduction then does indeed look knowledge base of Example 5 con- each LUSH/SLD step as a further very much like the trace of a stack- tains enough information to pro- small increment in that construc- oriented 'computation'. vide four different answers (a b) to tion. Since the node chosen by the To compute all the answers to a the query 3abUNCLE(a, b), LUSH/SLD selection function as given query 3xl ... 3Xm (GI and namely: (TED ANN), (TED JIM), the site of the next increment of • . . and Gn), we initialize the state (JOE TOD), (JOE RON). This cor- constuction is obviously arbitrary, of the computation to the 'state'

Q0

setting it up to be the clause s ANSWER (t) ...... Q0 = ANSWER(xl . . . Xm)*-Gl • • • Gn P (K (t) t K (t)) whose antecedent consists of the initial set of 'goals' and whose con- P(uzw) clusion is a special 'system' atom ...... ANSWER(x] . . . Xm) acting as the P(xyu) P(yzv) P(xvw) formal 'answer template'. We then begin a series of computation steps, 1 i I each of which is a single LUSH/SLD resolution inference. In general the (t+ 1)st step transforms the t th 3The idea of using a formal answer template in this way was originatedby Cordell Green in the QA systemsdescribed earlier in this essay. FIGURE 10.

COMMUNICATIONS OF THE ACM/March 1992/Vo1.35, No,3 61 Logic Programming there is no restriction on what that the set of answers: pret atoms P(a b c) as saying that selection function is taken to be. a.b = c. The first two clauses then {x = 0, x = s(0), x = s(s(0)) ..... }. together assert that • is associative. Serial vs. Parallel Computation in Each answer comes from an ultra- The third says that the equation Logic Programming resolution inference whose main x.a = b always has the solution x = The branches of the LUSH/SLD premise is: G(a b), while the fourth says that the equation a.x = b always has the computation tree in the first logic ANSWER(x)~-NUMBER(x). programming (Prolog) systems solution x = H(a b). The query is were generated serially, in a depth- All these answers are given by cov- then seen to be asking whether first, backtracking search. This ers which exhibit the same general there is a t such that for all k, k-t = tree-search method is subject to pattern. The LUSH/SLD computa- k, that is, whether there is a right iden- embarrassing 'depth-first' infinite tion tree is an infinite binary tree tity element. runaways when nonterminating which has only two states at each nonzero depth t. One of these two branches are present in the tree, The kernel of the cover shown in states is the clause but it is otherwise a simple, natural Figure 10 is unifiable and has the and effective way to search the ANSWER(S(S(... 0...)))~--- mgu complete LUSH/SLD computation tree and thus find the set of all an- with t occurrences of 's', and pro- cr={t=z=H(yy),x= swers to a query. The answers will duces the answer G(yK(H(yy))),a=b=r=v=y, be generated one at a time, as each s=u=w=K(H(yy))} x = s(s(... 0...)); terminal state is encountered. If a and so the inference yields the un- the other is the clause query has infinitely many answers conditional clause ANSWER(H(y (and the search tree therefore has ANSWER(S(S(... S(x)...))) y))*-- containing the universally infinitely many branches), then the ~--NUMBER(x) quantified variable 'y'. In effect, the set of all answers will simply (and response to the query are there right correctly, in a reasonable sense) be with t + 1 occurrences of 's', which identity elements? is the general prop- generated as a nonterminating se- has two successors, and so on. osition: yes--for all y, H(y y) is a right quence. identity element. It is surely clear, however, that General Oueries and Answers the branches of the LUSH/SLD Queries can contain universally Parallelism in Ultraresolution computation tree need not be con- quantified variables, and so can Inferences structed one at a time in this depth- their answers. Consider, for exam- The potential parallelism in the first back-tracking manner. One ple, the knowledge base: breadth-first growth of the LUSH/ can instead grow the tree breadth- SLD tree is the kind which has P(u z w)~---P(x y u)P(y z v)P(x v w) first, with no back-tracking, by come to be known as or-parallelism. P(x v w)*--P(x y u)P(y z v)P(u z w) computing successive sets of states, Since each state may have several P(G(a b) a b)*-- starting with the singleton set {Q0}, immediate-successor states it corre- P(a H(a b) b*--. and continuing, in general, by com- sponds to the fact that there may be puting the (t + 1) st set as the set of and the query: 3 t Vk P(k t k). The alternative possible choices of a all the immediate successors of all negation of the query is: Vt 3k not positive clause as side premise for the states in the t th set. The differ- P(k t k), so (as explained, for exam- that state as main premise. As we ent completed computations, to- ple, in [12]) in order to have a have seen, the classical LUSH/SLD gether with their associated an- clause we must eliminate the exis- search (in its breadth-first version) swers, will be harvested, at each tential quantifier. This is done by is a clever way to compute all possi- level, as their corresponding termi- introducing a 'Skolem term' K(t) in ble ultraresolution inferences nal states turn up in these state sets. place of the existential variable. which have a given conditional There is, of course, no logical sig- The negated query then is: Vt not clause Q as main premise, with cov- nificance to the order in which these P(K(t) t K(t)), or in other words the ering premises taken from a given answers are generated: the answers negative clause: ~---P(K(t) t K(t)). fixed knowledge base P. So the or- logically form a set, not a sequence. Thus the initial state for the LUSH/ parallel version of the LUSH/SLD It is easy to find examples of que- SLD computation is the clause: process is a way of exploiting at ries which have infinitely many an- ANSWER(t)~--P(K(t) t K(t)). least some of the potential parallel- swers. For example, if the knowl- An intuitive way to understand ism of the uhraresolution inference edge base is the set of clauses: the clauses of this knowledge base is scheme. to interpret their variables as rang- The challenge to the software {NUMBER(0),.--, NUMBER(S(x)) ing over the elements of some set and hardware designers of future ~---NUMBER(x)} which is closed under a binary com- logic programming systems, how- then the query ::Ix NUMBER(x) has position operation., and to inter- ever, comes from the clear percep-

62 March 1992/Vol.35, No.3/COMMUNICATION$OF THE ACM tion that there is more potential simply lets this effect maximize it- entscheidungsproblem. J. Symbolic Logic parallelism 'waiting there' than just self. 1 (1936) 40-41. (Reprinted in [11]). the or-parallelism. The computa- Here, however, it is clear that we 7. Colmerauer, A. Les Systemes-Q ou tion of the set of all ultraresolutions have arrived at a point where un formalisme pour analyser et syn- thesizer des phrases sur ordinateur. with main premise Q and covering merely logical considerations must Rep. 43, Department of Computer premises in P is abstractly just a yield the center stage to highly Science, University of Montreal, matter of generating all covers of Q technical questions of algorithm 1970. and then checking the kernel of design, complexity analysis, and 8. Colmerauer, A., Kanoui, H., each to see if it is unifiable. This, parallel computation, the discus- Pasero, R. and Roussel, P. Un sys- however, is precluded by a 'combi- sion of which is outside the scope of teme de communication homme- natorial explosion' problem. There this article. machine en Frangais. Tech. Rep., are simply too many covers. Even in Groupe d'Intelligence Artificielle, the small family knowledge base we Glimpses Beyond Universite d'Aix Marseille II, considered earlier, there are (as we In this article I have discussed only Luminy, France, 1973. 9. Davidon, W. Personal communica- noted) only four ultraresolutions the historical and conceptual back- tion, 1962. with the main premise ground of the logical origins of 10. Davis, M. Eliminating the irrelevant logic programming. I have concen- from mechanical proofs. In Proceed- ANSWER(a b)*--UNCLE(a b) trated on the resolution theorem- ings, Symposia of Applied Mathematics proving ideas which have been my 15, American Mathematical Society, and covering premises in the main interest from 1960 until the 1963, pp. 15-30. (Reprinted in knowledge base. This means that present. In describing its develop- [44], vol. 1, 315-330). there are only four covers of its an- ment to the present, I have briefly 11. Davis, M., Ed. The Undecidable: Basic tecedent whose kernels are unifiable. sketched the overall framework Papers on Undecidable Propositions, There are, however, several billion within which today's specialists are Unsolvable Problems, and Computable Functions. Raven Press, 1965. covers of this antecedent whose seeking to exploit as much as possi- kernels are not unifiable. Despite ble of the potential parallelism 12. Davis, M. and Putnam, H. A com- puting procedure for quantification the large size of the space to be which is clearly present in the fun- theory. J. Assoc. Comput. Mach. 7 searched in this simple example, a damental processes. The rest of (1960), 201-216. (Reprinted in breadth-first (quasi-or-parallel) that story is now better left for oth- [44], vol. 1, 125-139). LUSH/SLD computation generates ers, more qualified, to tell. 13. Dwork, C, Kanellakis, P., and a tree of about 140 states, level by Mitchell, J.C. On the sequential na- level down to a depth of about 25, Acknowledgments ture of unification.J. Logic Program. in order to produce all four an- I am grateful to Jacques Cohen, 1, (1984), 35-50. swers and to show that there are no Jack Minker and Jonas Barklund 14. Frege, G. Begriffsschrifft, a Formula more. The power of the LUSH/ for their excellent suggestions after Language, Modelled Upon that of SLD search method rests in the fact reading an earlier version of this Arithmetic, for Pure Thought. English that entire subtrees of these 'fail- article. I have tried to follow all of translation in [48], 1-82. ures' are constantly being elimi- them. [] 15. Gentzen, G. The Collected Papers of nated from the search. Its incremen- Gerhard Gentzen. M.E. Szabo Ed., tal unification process in effect References North-Holland, 1969. 16. Gilmore, P.C. A Proof Method for detects a source of nonunifiability 1. Apt, K.R. and van Emeden, M.H. Contributions to the Theory of Logic Quantification Theory: its Justifica- as soon as it appears and therefore tion and Realization. IBM J. Res. never permits a partially grown Programming. J. Assoc. Comput. Mach. 29 (1982), 841-862. Dev. 4 (1960), 28-35. (Reprinted in cover containing that 'lethal gene' to 9. Barklund, J. Parallel unification. [44], volume 1, 151-158). 'breed' any progeny at all. Thus the Ph.D. thesis, Computing Science 17. GOdel, K. The Completeness of the Axi- LUSH/SLD pruning of the tree is as Department, Uppsala University, oms of the Functional Calculus of Logic, drastic as it can be. Delaying any of 1990. 1930. English translation, with com- this failure detection 'until later' 3. Baxter, L.D. An efficient unifica- mentary, in [48], 582-291. will only allow these sources of fail- tion algorithm. Tech. Rep. CS-73- 18. Green, C.C. The application of the- ure to propagate multiplicatively, 23, Department of Computer Sci- orem-proving to problem solving. so that the future computational ence, University of Waterloo, 1973. In Proceedings of the first International cost (whether in the extent of time 4. Baxter, L.D. The complexity of uni- Joint Conference on Artificial Intelli- consumed or in the number of par- fication. Ph.D. thesis, University of gence (Washington, D.C., 1969), pp. Waterloo, 1976. 219-240. allel resources needed) of detecting 5. Boyer, R.S. and Moore, J.S. The 19. Herbrand, J. Investigations in Proof all of them will grow at the same Sharing of Structure in Theorem Theory (1930). English translation of rate. Postponing all of the unifica- Proving Programs. Mach. lntell. 7 main parts, with commentary, in tion analysis until the generation of (1972), 101-116. [48], 525-581, and of entire thesis the set of all covers is completed 6. Church, A. A note on the in Jacques Herbrand: Logical Writings

COMMUNICATIONS OF THE ACM/March 1992/Vo1.35, No.3 63 Logic Programming

(edited by Warren Goldfarb), Har- Voghera, N. A mechanical proof 48. van Heijenoort, J. Ed. From Frege to vard, 1971, 44-202. procedure and its realization in an Godel; A Source Book in Mathematical electronic computer.JACM 7 (1960) Logic, 1879-1931. Harvard Univer- 20. Hewitt, C. PLANNER: A language 102-128. (Reprinted in [44], vol- sity Press, 1967. for proving theorems in robots. In Pro- ume 1, 200-228). 49. Wang, H. Towards mechanical ceedings of the first International Joint 36. Raphael, B. Programming a robot. mathematics. IBM J. Res. Dev. 4 Conference on Artificial Intelligence, In Proceedings of Fourth IFIP Con- (1960), 2-22. (Reprinted in [44], (Wash., D.C., 1969), pp. 295-301. gress, North Holland, 1968, 135- volume 1, 244-264). 21. Hill, R. LUSH Resolution and its 139. Completeness. DCL Mem. 78, De- 50. Warren, D.H.D. Implementing 37. Robinson, A. Proving a theorem (as partment of Artificial Intelligence, PROLOG--compiling predicate Done by Man, Logician, or Ma- University of Edinburgh, 1974. logic programs (Res. Rep. 39 and chine). Summaries of talks pre- 22. Huet, G. Resolution des equations dans 40), and Logic programming and sented at the Summer Institute for langages d'order 1, 2 .... oJ. These compiler writing (Res. Rep. 44). Symbolic Logic. Communications d'Etat, Universite Paris VII, 1976. Department of Artificial Intelli- Research Division, Institute for 23. Kneale, W.C. and Kneale, M. The gence, University of Edinburgh, Defense Analysis, Princeton, 1957. Development of Logic. Oxford, 1962. 1977. (Reprinted in [44], volume 1, 74- 24. Kowalski, R.A. Predicate Calculus 51. Winograd, T. Understanding Natural 76). as a Programming Language. In Language. Academic Press, 1973. 38. Robinson, J.A. Theorem proving Proceedings of Sixth IFIP Congress, 52. Wos, L.T., Carson, D. and Robin- on the computer. J. Asso. Comput. North Holland, 1974, pp. 569-574. son, G.A. The unit preference strat- Mach. 10 (1963) 163-174. (Re- egy in theorem proving. AF1PS 25. Kowalski, R.A. Logic for Problem printed in ]44], volume 1, 372- Solving. North-Holland, 1979. Conference Proceedings 26, Spartan 383). Books, Wash. D.C., 1964, pp. 615- 26. Kowalski, R.A. The early years of 39. Robinson, J.A. A machine-oriented logic programming. Commun. ACM 621. (Reprinted in [44], vol. 1,387- logic based on the resolution princi- 393). 31 (1988), 38-43. ple. JACM 12 (1965), 23-41. (Re- 27. Kowalski, R.A. and Kuehner, D. printed in [44], volume 1, 397- 53. Wos, L.T., Carson, D. and Robin- Linear Resolution with Selection Func- 415). son, G.A. Efficiency and complete- tion. Artificial Intelligence 2 (1971), 40. Robinson, J.A. Automatic Deduc- ness of the set of support strategy in 227-260. (Reprinted in [44], vol- tion with Hyper-resolution. Int. J. theorem proving. J. Assoc. Comput. ume 1, 542-577). Comput. Math. 1 (1965), 227-234. Mach. 12, 1965, pp. 536-541, (Re- 28. Loveland, D.W. Mechanical theo- (Reprinted in [44], volume 1,416- printed in [44], volume 1, 484- rem proving by model elimination. 423). 489). J. ACM 15 (1968), 236-251. (Re- 41. Robinson, J.A. Computational printed in [44], volume 2, 117- logic: the unification computation. CR Categories and Subject Descrip- 134). Machine Intell. 6 (1970), 63-72. tors: D. 1.6 [Programming Techniques]: 29, L6wenheim, L. On Possibilities in the 42. Robinson, J.A. Fast Unification. Logic Programming; D.3.2 [Program- Calculus of Relatives, 1915. English Tagung fiber Automatisches Be- ming Languages]: Language Classifica- translation in [48], 228-251. weisen, Mathematisches For- tions--PROLOG; F.2.2 [Analysis of 30. Luckham, D. Refinement Theorems in schungsinstitut Oberwolfach, 1976. Algorithms and Problem Complexity]: Resolution Theory. Lecture Notes in 43. Robinson, G.A. and Wos, L.T. Nonnumerical Algorithms and Prob- Mathematics 125, Springer, 1970, Paramodulation and theorem prov- lems--complexity of proof procedures, pat- 163-190. ing in first-order theories with tern matching; F.4.1 [ 31. McCarthy, J. Programs with Com- equality. Machine lntell. 4 (1969), and Formal Languages]: Mathematical mon Sense. In Proceedings of a Sym- 103-133. Logic--computational logic, logic program- posium on the Mechanization of 44. Siekmann, J.H. and Wrightson, G. ming, mechanical theorem proving, proof Thought. H.M. Stationery Office, Eds. Automation of Reasoning. Classi- theory; 1.2.3 [Artificial Intelligence]: London, 1959. (Reprinted in Se- cal Papers on Computational Logic Deduction and Theorem Proving-- mantic Information Processing MIT (1957-1966). Two volumes, answer~reason extraction, deduction, logic Press, 1968). Springer, 1983. programming, metatheory, resolution; 1.2.4 32. Minsky, M. The Society of Mind. 45. Skolem, T. Logico-combinatorial In- [Artificial Intelligence]: Knowledge Simon and Schuster, 1985. vestigations in the Satisfiability or Representation Formalisms and 33. Newell, A., Shaw, J.C. and Simon, Provability of Mathematical Proposi- Methods--predicate logic H.A. Empirical explorations with tions (1920). English translation, the logic theory machine: A case with commentary, in [48], 252-263. Additional Key Words and Phrases: study in heuristics. In Proceedings of 46. Tarski, A. Logic, Semantics, Unification the Western Joint Computer Conference Metamathematics: Papers from 1923 to (1957), pp. 218-239. (Reprinted in 1938, translated by J.H. Woodger, Further Reading [44], volume 1, 49-73). Oxford, 1956. 34. Prawitz, D. An Improved Proof Proce- 47. Turing, A.M. On computable num- Several excellent and recent fur- dure. Theori_a 26 (1960) 102-139. bers, with an application to the ther sources are: J.W. Lloyd's (Reprinted in [44], volume 1, 159- entscheidungsproblem. In Proceed- Foundations of Logic Programming 199, with a preface by the author). ings of the London Mathematical Soci- (second, extended edition, 35. Prawitz, D., Prawitz, H. and ety, 1937. (Reprinted in ll 1]). Springer-Verlag 1987) and Logic,

64 March 1992/Vo1.35, No.3/COMMUNICATIONS OF THE ACM Programming and Prolog by Ulf Nils- system combining the lambda calculus copies are not made or distributed for direct son and Jan Maluszynski (Wiley, (for functional programming) with the commercial advantage, the ACM copyright notice and the title of the publication and its 1990) provide rigorous but reada- predicate calculus (for logic program- ming) at the University of Tokyo, where date appear, and notice is given that copying ble accounts not only of much of is by permission of the Association for the material covered in the present he is on a year's leave. Author's Present Computing Machinery. To copy otherwise, or Address: Office of the University Pro- article but also of many noteworthy to republish, requires a fee and/or specific fessor, Syracuse University, Syracuse, permission. later developments. Among these NY 13244-2010 are: Permission to copy without fee all or part of this material is granted provided that the © ACM 0002-0782/92/0300-040 $1.50 • the addition of imperative con- trol features such as the cut; • the elegant negation as failure technique by which all modern ALS Prolog realizes Prolog systems permit negative conditions in both positive and Logic Programming. negative conditional clauses; Logic Programming provides one of the most advanced and refined • the inclusion of arithmetical, list- processing, metalinguistic and other approaches ~for solving complex programming problems. applied predicates and operators After all, L~7 / logic itselfhas been under development by the among the atoms and terms; human race ~-.J~ for well over 2,000 years. Prolog is the • alternative logic programming paradigms, such as concurrent most successful ~,._~/ realization of the Logic Programming logic programming, constraint logic approach, providing/""--~ a very high conceptual approach to programming, and higher-order logic problem analysis and implementation, coupled with extremely programming. general and fast pattern-matching.//~-~ And ALS Prolog is by For the reader who wishes to far the most powerful collection of\~f~\ Prolog compilers learn more about applications and methodology of logic programming, available. Whether your task is ~/~ advanced exploratory about Prolog, and about exploiting research, or the development of ~ complex production systems, the potential parallelism in logic, I the ALS Prolog compiler is the ~ Itool of choice. also recommend the following re- cent books: Develop with one ALS Prolog compiler, and you re devel- oping with them all. ALS is committed to a uniform implementation • The Art of Prolog by L. Sterling on all platforms, yet you get access to all ~ the facilities of and E. Shapiro (MIT Press, 1986); each platform, including each native win- ~.~ / dowing system. • Prolog Programming for Artificial You can couple your Prolog programs to C ~ programs via a Intelligence by I. Bratko (second very broad C interface ~ which allows Prolog to manipulate edition, Addison-Wesley, 1990); • The Craft of Prolog by R. O'Keefe C data, and allows C to \~/~call into Prolog. Stream-based (MIT Press, 1990). IPC communication, localk~//and remote, is available. We sup- • Essentials of Logic Programming by port 386/486 machines under SCO Unix and DOS (virtual memory), C.J. Hogger (Oxford: Clarendon Press, 1990), soon with Windows 3.0, as well as the Apple ~ Macintosh, Sun • Parallelism in Logic: its potential for SPARC and 680x0, DEC vax (VMS) and ~7 ~ ~ performance and program develop- all Motorola 88000-based machines, and ment by Franz Kurfess (Braun- schweig, Vieweg, 1991). planning to add even more platforms in the • Parallel Logic Programming by future. ~ ~j ~Z~~] ~[ Evan Tick (MIT Press, 1991). Call or write today. If you're About the Author: learning Prolog, ask about our J.A. ROBINSON teaches philosophy and at the University student versions l l APPLIED LOGIC SYSTEMS, INC. of Syracuse, where he is now University for the PC and ~ P.O. BOX 90, UNIVERSITY STATION Professor. His research interests include Macintosh. 1 1 SYRACUSE, NY, 13210 USA computational logic and automated deduction. He is currently working on a 1 PHONE: 315-471-3900 FAX:315-471-2606 massively parallel logical computation Circle #79 on Reader Service Card

COMMUNICATIONS OF THE ACM/March 1992/Vol.35, No.3 6S