A DEFINITE CLAUSE VERSION OF CATEGORIAL GRAMMAR

Remo Pareschi," Department of Computer and Information Science, University of Pennsylvania, 200 S. 33 rd St., Philadelphia, PA 19104,t and Department of Artificial Intelligence and Centre for Cognitive Science, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, Scotland remo(~linc.cis.upenn.edu

ABSTRACT problem by adopting an intuitionistic treatment of implication, which has already been proposed We introduce a first-order version of Catego- elsewhere as an extension of for implement- rial Grammar, based on the idea of encoding syn- ing hypothetical reasoning and modular logic pro- tactic types as definite clauses. Thus, we drop gramming. all explicit requirements of adjacency between combinable constituents, and we capture word- order constraints simply by allowing subformu- 1 Introduction lae of complex types to share variables ranging over string positions. We are in this way able Classical Categorial Grammar (CG) [1] is an ap- to account for constructiods involving discontin- proach to natural language syntax where all lin- uous constituents. Such constructions axe difficult guistic information is encoded in the lexicon, via to handle in the more traditional version of Cate- the assignment of syntactic types to lexical items. gorial Grammar, which is based on propositional Such syntactic types can be viewed as expressions types and on the requirement of strict string ad- of an implicational calculus of propositions, where jacency between combinable constituents. atomic propositions correspond to atomic types, We show then how, for this formalism, and implicational propositions account for com- can be efficiently implemented as theorem proving. plex types. A string is grammatical if and only Our approach to encoding types:as definite clauses if its syntactic type can be logically derived from presupposes a modification of standard Horn logic the types of its words, assuming certain inference syntax to allow internal implications in definite rules. clauses. This modification is needed to account for In classical CG, a common way of encoding the types of higher-order functions and, as a con- word-order constraints is by having two symmet- sequence, standard Prolog-like Horn logic theorem ric forms of "directional" implication, usually in- proving is not powerful enough. We tackle this dicated with the forward slash / and the backward * I am indebted to Dale Miller for help and advice. I slash \, constraining the antecedent of a complex am also grateful to Aravind Joshi, Mark Steedman, David type to be, respectively, right- or left-adjacent. A x,Veir, Bob Frank, Mitch Marcus and Yves Schabes for com- word, or a string of words, associated with a right- ments and discussions. Thanks are due to Elsa Grunter and Amy Feh.y for advice on typesetting. Parts of this research (left-) oriented type can then be thought of as a were supported by: a Sloan foundation grant to the Cog- right- (left-) oriented function looking for an ar- nitive Science Program, Univ. of Pennsylvania; and NSF gument of the type specified in the antecedent. A grants MCS-8219196-GER, IRI-10413 AO2, ARO grants DAA29-84-K-0061, DAA29-84-9-0027 and DARPA grant convention more or less generally followed by lin- NOOO14-85-K0018 to CIS, Univ. of Pezmsylvani& guists working in CG is to have the antecedent and t Address for correspondence the consequent of an implication respectively on

270 the right and on tile left of the connective. Thus, sal and existential quantifiers. Let A be a syntactic tile type-assignment (1) says that the ditransitive variable ranging over the set of atoms, i. e. the set verb put is a function taking a right-adjacent ar- of atomic first-order formulae, and let D and G be gulnent of type NP, to return a function taking a syntactic variables ranging, respectively, over the right-adjacent argument of type PP, to return a set of definite clauses and the set of goal clauses. function taking a left-adjacent argument of type We introduce the notions of definite clause and NP, to finally return an expression of the atomic of goal clause via the two following mutually re- type S. cursive definitions for the corresponding syntactic variables D and G: (1) put: ((b~xNP)/PP)/NP The Definite Clause Grammar (DCG) framework • D:=AIG--AIVzDID1AD2 [14] (see also [13]), where phrase-structure gram- mars can be encoded as sets of definite clauses • G:=AIG1AG=I3~:GID~G (which are themselves a subset of Horn clauses), We call ground a clause not containing variables. and the formalization of some aspects of it in [15], We refer to the part of a non-atomic definite clause suggests a more expressive alternative to encode coming on the left of the implication connective word-order constraints in CG. Such an alterna- as to the body of the clause, and to the one on tive eliminates all notions of directionality from the right as to the head. With respect to standard the logical connectives, and any explicit require- Horn logic syntax, the main novelty in the defini- ment of adjacency between functions and argu- tions above is that we permit implications in goals ments, and replaces propositions with first-order and in the bodies of definite clauses. Extended • formulae. Thus, atomic types are viewed as atomic Horn logic syntax of this kind has been proposed formulae obtained from two-place predicates over to implement hypothetical reasoning [3] and mod- string positions represented as integers, the first ules [7] in logic programming. We shall first make and the second argument corresponding, respec- clear the use of this extension for the purpose of tively, to the left and right end of a given string. linguistic description, and we shall then illustrate Therefore, the set of all sentences of length j its operational meaning. generated from a certain lexicon corresponds to the type S(0,j). Constraints over the order of constituents are enforced by sharing integer in- dices across subformulae inside complex (func- 3 First-order tional) types. Categorial Grammar This first-order version of CG can be viewed as a logical reconstruction of some of the ideas behind 3.1 Definite Clauses as Types the recent trend of Categorial Unification Gram- mars [5, 18, 20] 1. A strongly analogous develop- We take CONN (for "connects") to be a three- ment characterizes the systems of type-assignment place predicate defined over lexical items and pairs for the formal languages of Combinatory Logic and of integers, such that CONN(item, i,j) holds if Lambda Calculus, leading from propositional type and only if and only if i = j - 1, with the in- systems to the "formulae-as-types" slogan which is tuitive meaning that item lies between the two behind the current research in type theory [2]. In consecutive string positions i and j. Then, a this paper, we show how syntactic types can be en- most direct way to translate in first-order logic coded using an extended version of standard Horn the type-assignment (1) is by the type-assignment logic syntax. (2), where, in the formula corresponding to the as- signed type, the non-directional implication con- nective --, replaces the slashes. 2 Definite Clauses with In- ternal Implications (2) put : VzVyYzVw[CONN(put, y - 1, y) --* Let A and ---* be logical connectives for conjunc- (NP(y, z) -- tion and implication, and let V and 3 be the univer- (PP(z, w) -- 1Indeed, Uszkoreit [18] mentions the possibility of en- coding order constraints among constituents via variables (NP(z, y - 1) --* ranging over string positions in the DCG style. s(=, ~o))))]

271 A definite clause equivalent of tile formula in (2) 3.3 Arithmetic Predicates is given by the type-assignment (3) 2 . The fact that we quantify over integers allows us to use arithmetic predicates to determine sub- sets of indices over which certain variables must VzVyVzVw[CONN(put, y -- 1, y) A (3) put: range. This use of arithmetic predicates charac- NP(y, z) ^ terizes also Rounds' ILFP notation [15], which ap- PP(z, w) A pears in many ways interestingly related to the gP(z, y - 1) --* S(x, w)] framework proposed here. We show here below how this capability can be exploited to account for a case of extraction which is particularly prob- Observe that the predicate CONNwill need also lematic for bidirectional propositional CG. to be part of types assigned to "non-functional" lexical items. For example, we can have for the noun-phrase Mary the type-assignment (4). 3.3.1 Non-perlpheral Extraction Both the propositional type (5) and the first- order type (6) are good enough to describe the (4) Mary : Vy[OONN(Mary, y- 1,y) .-.-* kind of constituent needed by a relative pronoun NP(y - 1, y)] in the following right-oriented case of peripheral extraction, where the extraction site is located at one end of the sentence. (We indicate the extrac- 3.2 Higher-order Types and Inter- tion site with an upward-looking arrow.) nal Implications

Propositional CG makes crucial use of func- which [Ishallput a book on T ] tions of higher-order type. For example, the type- assignment (5) makes the relative pronoun which However, a case of non.peripheral extraction, into a function taking a right-oriented function where the extraction site is in the middle, such from noun-phrases to sentences and returning a as relative clause 3. This kind of type-assignment has been used by several linguists to provide attractive which [ I shall put T on the table ] accounts of certain cases of extraction [16, 17, 10].

is difficult to describe in bidirectional proposi- tional CG, where all functions must take left- or (5) which: REL/(S/NP) right-adjacent arguments. For instance, a solution like the one proposed in [17] involves permuting In our definite clause version of CG, a similar the arguments of a given function. Such an opera- assignment, exemplified by (6), is possible, since tion needs to be rather cumbersomely constrained • implications are allowed in the. body of clauses. in an explicit way to cases of extraction, lest it Notice that in (6) the noun-phrase needed to fill should wildly overgenerate. Another solution, pro- the extraction site is "virtual", having null length. posed in [10],is also cumbersome and counterintu- itive, in that involves the assignment of multiple types to wh-expressions, one for each site where (6) which: VvVy[CONN(which, v - 1, v) ^ extraction can take place. (NP(y, y) --* S(v, y)) --* On the other hand, the greater expressive power REL(v - 1, y)] of first-order logic allows us to elegantly general- ize the type-assignment (6) to the type-assignment (7). In fact, in (7) the variable identifying the ex- 2 See [2] for a pleasant formal characterization of first- order definite clauses as type declarations. traction site ranges over the set of integers in be- aFor simplicity sake, we treat here relative clauses as tween the indices corresponding, respectively, to constituents of atomic type. But in reality relative clauses the left and right end of the sentence on which are noun modifiers, that is, functions from nouns to nouns. Therefore, the propositional and the first-order atomic type the rdlative pronoun operates. Therefore, such a for relative clauses in the examples below should be thought sentence can have an extraction site anywhere be- of as shorthands for corresponding complex types. tween its string boundaries.

272 coordination, in principle nothing prevents its use to analyze strings not characterized by such phe- (7) which : VvVyVw[CONN(which, v - 1, v) A nomena, which would be analyzable in terms of (NP(y, y) --.* S(v, w)) A functional application alone. Tentative solutions of this problem have been recently discussed in v

273 Kripke models. search, and unification like a Prolog interpreter. (An example of such an implemen- 4.1.1 Logic Programs tation, as a metainterpreter on top of Lambda- Prolog, is given in [9].) Observe however that We take a logic program or, simply, a program an important difference of such a theorem prover 79 to be any set of definite clauses. We formally from a standard Prolog interpreter is in the wider represent the fact that a goal clause G is logically distribution of "logical" variables, which, in the derivable from a program P with a sequent of the logic programming tradition, stand for existen- form 79 =~ G, where 79 and G are, respectively, the tially quantified variables within goals. Such vari- antecedent and the succedent of the sequent. If 7~ ables can get instantiated in the course of a Prolog is a program then we take its substitution closure proof, thus providing the procedural ability to re- [79] to be the smallest set such that turn specific values as output of the computation. Logical variables play the same role in the pro- • 79 c_ [79] gramming language we are considering here; more- • if O1 A D2 E [7~] then D1 E [79] and D2 E [7~] over, they can also occur in program clauses, since subformulae of goal clauses can be added to pro- • ifVzD E [P] then [z/t]D E [7~] for all terms t, grams via proof rule (V). where [z/t] denotes the result of substituting t for free occurrences of t in D 4.2 How Strings Define Programs

4.1.2 Proof Rules Let a be a string a, ... an of words from a lex- icon Z:. Then a defines a program 79a = ra tJ Aa We introduce now the following proof rules, such that which define the notion of proof for our logic pro- gramrning language: • Fa={CONN(ai,i-l,i) ll

(I) 79=G ifaE[7 )] • Aa={Dlai:DEZ:andl, _>. (Prolog's evaluation mechanism treats arithmetic expressions in a similar way.) Then, 7~U {O} =~ G under this approach a string a is of type Ga if and (V) P ~ D--. G only if there is a proof for the sequent 7)aU.4 ::~ Ga according to rules (I) - (V). In the inference figures for rules (II) - (V), the sequent(s) appearing above the horizontal line are the upper sequent(s), while the sequent appearing 4.3 An Example below is the lower sequent. A proof for a sequent We give here an example of a proof which deter- 7) =~ G is a tree whose nodes are labeled with mines a corresponding type-assignment. Consider sequents such that (i) the root node is labeled with the string 79 ~ G, (ii) the internal nodes are instances of one of proof rules (II) - (V) and (iii) the leaf nodes are whom John loves labeled with sequents representing proof rule (I). The height of a proof is the length of the longest Such a sentence determines a program 79 with path from the root to some leaf. The size of a the following set F of ground atoms: proof is the number of nodes in it. Thus, proof rules (I)-(V) provide the abstract { CONN(whom, O, I), specification of a first-order theorem prover which CONN(John, I, 2), can then be implemented in terms of depth-first CONN(loves, 2, 3)}

274 \,Ve assume lexical type assignments such that hypotheses, so as to account for word-order con- the remaining set of clauses A is as follows: straints. Under our approach, word-order con- straints are obtained declaratively, via sharing of {VxVz[CONN(whom, x - 1, x) A string positions, and there is no strict adjacency (NP(y, y) --* S(x, y)) --* requirement. In proof-theoretical terms, this di- REL(x - 1, y)], rectly translates in viewing programs as unordered sets of hypotheses. gx[CONN(John, x - 1, x) -* NP(x - 1, x)], 5.2 Trading Contraction against W:VyVz[CONN(Ioves, y - 1, y) A NP(y, z) A NV(x, y - 1) --~ Decidability s(x, z)l} The logic defined by rules (I)-(V) is in general undecidable, but it becomes decidable as soon as The clause assigned to the relative pronoun Contraction is disallowed. In fact, if a given hy- whom corresponds to the type of a higher-order pothesis can be used at most once, then clearly the function, and contains an implication in its body. number of internal nodes in a proof tree for a se- Figure 1 shows a proof tree for such a type- quent 7~ =~ G is at most equal to the total number assignment. The tree, which is represented as of occurrences of--*, A and 3 in 7~ =~ G, since these growing up from its root, has size 11, and height are the logical constants for which proof rules with 8. corresponding inference figures have been defined. Hence, no proof tree can contain infinite branches and decidability follows. 5 'Structural Rules Now, it seems a plausible conjecture that the programs directly defined by input strings as in We now briefly examine the interaction of struc. Section 4.2 never need Contraction. In fact, each tural rules with parsing. In intuitionistic sequent time we use a hypothesis in the proof, either we systems, structural rules define ways of subtract- consume a corresponding word in the input string, ing, adding, and reordering hypotheses in sequents or we consume a "virtual" constituent correspond- during proofs. We have the three following struc- ing to a step of hypothesis introduction deter- tural rules: mined by rule (V) for implications. (Construc- • Intercha~,ge, which allows to use hypotheses tions like parasitic gaps can be accounted for by as- in any order sociating specific lexical items with clauses which determine the simultaneous introduction of gaps of • Contraction, which allows to use a hypothesis the same type.) If this conjecture can be formally more than once confirmed, then we could automate our formalism via a metalnterpreter based on rules (I)-(V), but • Thinning, which says that not all hypotheses implemented in such a way that clauses are re- need to be used moved from programs as soon as they are used. Being based on a decidable fragment of logic, such a metainterpreter would not be affected by the 5.1 Programs as Unordered Sets of kind of infinite loops normally characterizing DCG Hypotheses parsing. All of the structural rules above are implicit in proof rules (I)-(V), and they are all needed to ob- 5.3 Thinning and Vacuous Abstrac- tain intuitionistic soundness and completeness as tion in [7]. By contrast, Lambek's propositional calcu- lus does not have any of the structural rules; for Thinning can cause problems of overgeneratiou, instance, Interchange is not admitted, since the as hypotheses introduced via rule (V) may end up hypotheses deriving the type of a given string must as being never used, since other hypotheses can be also account for the positions of the words to which used instead. For instance, the type assignment they have been assigned as types, and must obey the strict string adjacency requirement between (7) which : VvVyVw[CONN(which, v - 1, v) A functions and arguments of classical CG. Thus, (gP(y, y) ~ S(v, w)) A Lambek's calculus must assume ordered lists of v<_yAy<_w--.

275 U {NP(3,3)} ~ CONN(John, ],2) (If) T'U {NP(3,3)} = NP(I,2) PU {NP(3,3)} = NP(3,3) (III) P U {NP(3, 3)} ~ CONN(Ioves, 2, 3) 7) U {NP(3, 3)) =~ NP(1, 2) A NP(3, 3) (III) 7) U {NP(3,3)} =# CONN(loves, 2,3) A NP(I,2) A NP(3, 3) (II) 7)U {NP(3,3)} => S(1,3) 7) => CONN(whom, O,1) P =~ NP(3,3) --* S(1,3) (V) , (ziz) 7) =# CONN(whom, O, I) A (NP(3, 3) -- S(I, 3)) (II) 7) ~ REL(O, 3)

Figure h Type derivation for whom John loves

REL(v- 1, w) ] site is hypothesized, where the last ungrammatical example corresponds to a case of vacuous abstrac- can be used to account for tile well-formedness of tion: both • Az put([a book], [on x], I) which [Ishallput a book on r ] • Az put(x, [on the table], I) and • Az put([a book], [on the table], I) which [ I shall put : on the table ] Constraints for filtering proof terms character- ized by vacuous abstraction can be defined in but will also accept the ungrammatical a straightforward manner, particularly if we are working with a metainterpreter implemented on which [ I shall put a bookon the table ] top of a language based on Lambda terms, such as Lambda-Prolog [8, 9]. Beside the desire to main- In fact, as we do not have to use all the hy- tain certain well defined proof-theoretic and se- potheses, in this last case the virtual noun-phrase mantic properties of our inference system, there corresponding to the extraction site is added to are other reasons for using this strategy instead the program but is never used. Notice that our of disallowing Thinning. Indeed, our target here conjecture in section 4.4.2 was that Contraction seems specifically to be the elimination of vacuous is not needed to prove the theorems correspond- Lambda abstraction. Absence of vacuous abstrac- ing to the types of grammatical strings; by con- tion has been proposed by Steedman [17] as a uni- trast, Thinning gives us more theorems than we versal property of human languages. Morrill and want. As a consequence, eliminating Thinning Carpenter [11] show that other well-formedness would compromise the proof-theoretic properties constraints formulated in different grammatical of (1)-(V) with respect to intuitionistic logic, and theories such as GPSG, LFG and GB reduce to the corresponding Kripke models semantics of our this same property. Moreover, Thinning gives us programming language. a straightforward way to account for situations of There is however a formally well defined way to lexical ambiguity, where the program defined by a account for the ungrammaticaiity of the example certain input string can in fact contain hypothe- above without changing the logical properties of ses which are not needed to derive the type of the our inference system. We can encode proofs as string. terms of Lambda Calculus and then filter certain kinds of proof terms. In particular, a hypothesis introduction, determined by rule (V), corresponds References to a step of A-abstraction, wllile a hypothesis elim- ination, determined by one of rules (I)-(II), cor- [1] Bar-Hillel, Yehoslma. 1953. responds to a step of functional application and A Quasi-arithmetical Notation for Syntactic A-contraction. Hypotheses which are introduced Description. Language. 29. pp47-58. but never eliminated result in corresponding cases of vacuous abstraction. Thus, the three examples [2] Huet, Gerard 1986. Formal Structures for above have the three following Lambda encodings Computation and Deduction. Unpublished of the proof of the sentence for which an extraction lecture notes. Carnegie-Mellon University.

276 [3] Gabbay, D. M., and U. Reyle. 1984. N-Prolog: [14] Pereira, Fernando C. N. and David II. D. An Extension of Prolog with lIypothetical Im- Warren. 1980. Definite Clauses for Language plications. I The Journal of Logic Program- Analysis. Artificial Intelligence. 13. pp231- ruing. 1. pp319-355. 278.

[4] Joshi, Aravind. 1987. Word.order Variation [15] Rounds, William C. 1987. LFP: A Logic for in Natural Language Generation. In Proceed- Linguistic Descriptions and an Analysis of lts ings of the National Conference on Artificial Complexity. Technical Report No. 9. The Uni- Intelligence (AAAI 87), Seattle. versity of Michigan. To appear in Computa- tional Linguistics. [5] Karttunen, Lauri. 1986. Radical Lexicalism. Report No. CSLI-86-68. CSLI, Stanford Uni- [16] Steedman, Mark J. 1985. Dependency and versity. Coordination in the Grammar of Dutch and English. Language, 61, pp523-568 [6] Lambek, Joachim. 1958. The Mathematics of [17] Steedman, Mark J. 1987. Combinatory Gram- Sentence Structure. American Mathematical mar and Parasitic Gaps. To appear in Natu- Monthly. 65. pp363-386. • rat Language and Linguistic Theory. [7] Miller, Dale. 1987. A Logical Analysis of Mod. [18] Uszkoreit, Hans. 1986. Categorial" Unification ules in Logic Programming. To appear in the Grammar. In Proceedings of the 11th Inter- Journal of Logic Programming. national Conference of Computational Lin- guistics, Bonn. [8] Miller; Dale and Gopalan Nadathur. 1986. Some Uses of Higher.order Logic in Com- [19] Wittenburg, Kent. 1987. Predictive Combina- putational Linguistics. In Proceedlngs of the tots for the Efficient Parsing of Combinatory 24th Annual Meeting of the Association for Grammars. In Proceedings of the 25th An- Computational Linguistics, Columbia Uni- nual Meeting of tile Association for Compu- versity. tational Linguistics, Stanford University. [9] Miller, Dale and Gopalan Nadathur. 1987. A [20] Zeevat, H., Klein, E., and J. Calder. 1987. An Logic Programming Approach to Manipulat- Introduction to Unification Categorial Gram- ing Formulas and Programs. Paper presented mar. In N. Haddock et al. (eds.), Edinburgh at the IEEE Fourth Symposium on Logic Pro- Working Papers in Cognitive Science, 1: Cat- gramming, San Francisco. egorial Grammar, Unification Grammar, and Parsing. [10] Moortgat, Michael. 1987. Lambek Theorem Proving. Paper presented at the ZWO work- shop Categorial Grammar: Its Current State. June 4-5 1987, ITLI Amsterdam.

[11] Morrill, Glyn and Bob Carpenter 1987. Compositionality, Implicational Logic and Theories of Grammar. Research Paper EUCCS/RP-11, University of Edinburgh, Centre for Cognitive Science.

[12] Pareschi, Remo and Mark J. Steedman. 1987. A Lazy Way to Chart-parse with Categorial Grammars. In Proceedings of the 25th An- nual Meeting of the Association for Compu- tational Linguistics, Stanford University.

[13] Pereira, Fernando C. N. and Stuart M. Shieber. 1987. Prolog and Natural Language Analysis. CSLI Lectures Notes No. 10. CSLI, Stanford University.

277