<<

EFFICIENT PROCESSING OF FLEXIBLE CATEGORIAL

Gosse Bouma Research Institute for Knowledge Systems Postbus 463, 6200 AL Maastricht The Netherlands e-mail(earn): exriksgb@hmarl5

ABSTRACT* (i) application : A/B B ==> A B B\A ==> A From a processing point of view, however, composition: A/B B/C ==> NC flexible categorial systems are problematic, C\B B~, ==> C~ since they introduce spurious . In raising : A ==> (B/A)\B this paper, we present a flexible categorial A ==> B/(A\B) grammar which makes extensive use of the product-operator, first introduced by With this grammar many alternative con- Lambek (1958). The grammar has the prop- stituent structures for a sentence can be erty that for every reading of a sentence, a generated, even where this does not corre- strictly left-branching derivation can be spond to semantic . From a lin- given. This leads to the definition of a subset guistic point of view, this has many advan- of the grammar, for which the spurious ambi- tages. Various kind of arguments for giving guity problem does not arise and efficient up traditional conceptions of constituent processing is possible. structure can be given, but the most con- vincing and well-documented case in favour of flexible constituent structure is coordi- 1. Flexibility vs. Ambiguity nation (see Steedman (1985), Dowty (1988), and Zwarts (1986)). Categorial owe much of their popularity to the fact that they allow for The standard assumption in generative various degrees of flexibility with respect to grammar is that coordination always takes constituent structure. From a processing place between between constituents. Right- point of view, however, flexible categorial node raising constructions and other in- systems are problematic, since they intro- stances of non-constituent conjunction are duce spurious ambiguity. problematic, because it is not clear what the status of the coordinated elements in these The best known example of a flexible constructions is. Flexible categorial gram- is a grammar containing mar presents an elegant solution for such the reduction rules application and compo- cases, since, next to canonical constituent sition, and the category changing rule rais- structures, it also admits various other con- ing 1 • stituent structures. Therefore, the sentences in (2) can be considered to be ordinary in- stances of coordination (of two categories slap and (vp/ap)\vp, respectively). * I would like to thank Esther K0nig, Erik-Jan van der Linden, Michael Moortgat, (2) a. John sold and Mary bought a book Adriaan van Paassen and the participants of s/vp vp/np s/vp vp/np np the Edinburgh Categorial Grammar Weekend, who made useful comments to earlier s/np s/np presentations of this material. All remaining errors and misconceptions are of course my own. 1 Throughout this paper we will be us- ing the notation of Lambek (1958), in which left-directional functor respectively, looking A/B and B\A are a right-directional and a for an of category B. - 19- tory rules in the grammar. Some careful rewriting of the original grammar has to take b. J. loves Mary madly and Sue wildly place, before things work as desired. vp/np np vp\vp np vp\vp Pareschi & Steedman (1987) propose an Mary madly efficient chart-parsing algorithm for catego- np vp\vp rial grammars with spurious ambiguity. In- stead of the usual strategy, in which all pos- (vplnp)\vp sible subconstituents are added to the chart, Pareschi & Steedman restrict themselves to (vp/np)Wp adding only those constituents that may lead to a difference in . Thus, in (3) A somewhat different type of argument only the underlined constituents are in the for flexible phrase structure is based on the chart. The "---" constituent is not. way humans process natural language. In Ades & Steedman (1982) it is pointed out (3) John loves Mary madly that humans process natural language in a s/vp vp/np np vp\vp left-to-right, incremental, manner. This pro- cessing aspect is accounted for in a flexible categorial system, where constituents can be built for any part of a sentence. Since syn- tactic rules operate in parallel with semantic Combining 'madly' with the rest would be interpretation rules, building a syntactic impossible or lead to backtracking in the structure for an initial part of a sentence, normal case. Here, the Pareschi & Steedman implies that a corresponding semantic struc- algorithm starts looking for a constituent ture can also be constructed. left adjacent of madly, which contains an el- ement X/vp as a leftmost category. If such a These and other arguments suggest that constituent can be found, it can be concluded there is no such thing as a fixed constituent that the rest of that constituent must structure, but that the order in which ele- (implicitly) be a v p, and thus the validity of ments combine with eachother is rather free. combining vp\vp with this constituent has been established. Therefore, Pareschi & From a parsing point of view, however, Steedman are able to work with only a mini- flexibility appears to be a disadvantage. mal amount of items in the chart. Flexible categorial grammars produce large numbers of, often semantically equivalent, Both Wittenburg and Pareschi & Steed- derivations for a given phrase. This spurious man work with categorial grammars, which ambiguity problem (Wittenburg (1986)) contain restricted versions of composition makes efficient processing of flexible catego- and raising. Although they can be processed rial grammar problematic, since quite often efficiently, there is linguistic evidence that there is an exponential growth of the number they are not fully adequate for analysis of of possible derivations, relative to the length such phenomena as coordination. Since of the string to be parsed. atomic categories can in general not be raised in these grammars, sentence (2b) (in There have been two proposals for which the category n p has to be raised) eliminating spurious ambiguity from the cannot be derived. Furthermore, since grammar. The first is Wittenburg (1987). In composition is not generalized, as in Ades & this paper, a categorial grammar with compo- Steedman (1982), a sentence such as John sition and heavily restricted versions of sold but Mary donated a book to the library raising (for subject n p's only) is considered. would not be derivable. The possibilities for Wittenburg proposes to eliminate spurious left-to-right, incremental, processing are ambiguity by redefining composition. His also limited. Therefore, there is reason to predictive composition rules apply only in look for a more flexible system, for which those cases where they are really needed to efficient parsing is still possible. make a derivation possible. A disadvantage of this method, noticed by Wittenburg, is that one may have to add special predictive composition rules for all general combina-

- 20 - 2. Structural Completeness (5) Strong Structural Complete- ness In the next section we present a gram- If a sequence of categories XI .. Xn matical calculus, which is more flexible than reduces to Y, with semantics Y', the systems considered by Wittenburg there is a reduction to Y, with se- (1987) and Pareschi & Steedman (1987), and mantics Y', for any bracketing of therefore is attractive for linguistic pur- XI..Xn into constituents. poses. At the same time, it offers a solution to the spurious ambiguity problem. Grammars with this property, can poten- Spurious ambiguity causes problems for tially circumvent the spurious ambiguity parsing in the systems mentioned above, be- problem, since for these grammars, we only cause there is no systematic relationship have to inspect all left-branching between syntactic structures and semantic trees, to find all possible readings. This representations. That is, there is no way to method will only fail if the set of left- identify in advance, for a given sentence S, a branching trees itself would contain spuri- proper subset of the set of all possible syn- ous ambiguous derivations. In section 4 we tactic structures and associated semantic will show that these can be eliminated from representations, for which it holds that it the calculus presented below. will contain all possible semantic represen- tations of S. 3. The P-calculus

Consider now a grammar for which the The P(roduct)-calculus is a categorial following property holds: grammar, based on Lambek (1958), which has the property of strong structural com- (4) Structural Completeness pleteness. If a sequence of categories X1 .. X n reduces to Y, there is a reduction to In Lambek (1958), the foundations of Y for any bracketing of X1..Xn into flexible categorial grammar are formulated constituents. (Moortgat, 1987:5) 2 in the form of a calculus for syntactic cate- gories. Well-known categorial rules, such as Structural complete grammars are interest- application, composition and category-rais- ing linguistically, since they are able to ing, are theorems of this calculus. A largely handle, for instance, all kinds of non-con- neglected aspect of this calculus, is the use stituent conjunction, and also because they of the product-operator. allow for strict left-to-right processing (see Moortgat, 1988). The calculus we present below, was developed as an alternative for Moortgat's The latter observation has consequences (1988) M-system. The M-system is a subset for parsing as well,, since, if we can parse of the Lambek-calculus, which uses, next to every sentence in a strict left-to-right man- application, only a very general form of ner (that is, we produce only strictly left- composition. Since it has no raising, it seems branching syntax trees), the parsing algo- to be an attractive candidate for investigat- rithm can be greatly simplified. Notice, ing the possibilities of left-associative however, that such a parsing strategy is only parsing for categorial grammar. It is not valid, if we also guarantee that all possible completely satisfactory, however, since readings of a sentence can be found in this structural completeness is not fully guaran- way. Thus, instead of (4), we are looking for teed, and also, since it is unknown whether grammars with the following, slightly the strong structural completeness property stronger, property: holds for this system. In our calculus, we hope to overcome these problems, by using product-introduction and -elimination rules instead of composition.

2 Buszkowski (1988) provides a The kernel of the P-calculus is right- slightly different definition in terms of and left-application, as usual. Next to these, functor-argument structures.

-21 - we use a rule for introducing the product- S --> NP VP), or (in CG) with a reduction operator, and two inference rules for elimi- rule (application or composition, for in- nating products : stance), we now have the freedom to concatenate arbitrary categories, completely (6) RA : A/B B => A irrespective of their internal structure. LA : B B~, => A The P-calculus is structurally complete. (product) introduction: To prove this, we prove that for any four categories A,B,C,D, it holds that : I : A B => A*B (AB)C --> D <==> A(BC) --> D, where --> is the derivability relation. From this, inference rules : structural completeness may be concluded, since any bracketing (or branching of syntax P : AB=>C, DC=>E trees) can be obtained by applying this equivalence an arbitrary number of times. D*A B => E Proof : From (AB)C --> D it follows that P': AB=> C, CD=>E there exists a category E such that AB --> E and EC --> D. BC --> B'C, by I. Now A B*D => E A(B*C) --> D, by P', since AB --> E and EC --> D. Therefore, by transitivity of -->, We can use this calculus to produce left- A(BC) --> D. To prove that A(BC) --> D branching syntax trees for any given ==> (AB)C --> D use P instead of P'. (grammatical) sentence. (7) is a simple ex- ample 3 Semantics can be added to the grammar, by giving a semantic counterpart (in lower (7) John loves Mary madly case) for each of the rules in (6): s/vp vp/np np vp\vp I (8) RA : A/B:a B:b => A:a(b) s/vp*vp/np I_A: B:b B~A:a => A:a(b) (a) s/vp*vp (product) introduction: (b) I: A:a B:b => A*B:a*b

(a) vp/np np => vp,s/vp vp => s/vp*vp inference rules :

s/vp*vp/np np ffi> s/vp*vp P : A:a B:b => C:c, D:d C:c => E:e

(b) vp vp\vp => vp, s/vp vp => s D*A:d*a B:b => E:e

s/vp*vp vp => s P' : A:a B:b => C:c, C:c D:d => E:e

A:a B*D:b*d => E:e The first step in the derivation of (7) is the application of rule I. The other two reduc- We can now include semantics in the tions ((a) and (b)) are instantiations of the proof given above, and from this, we may con- inference rule P. As the example shows, the clude that strong structural completeness *-operator (more in particular its use in I) holds for the P-calculus as well. does something like concatenation, but whereas such operations are normally asso- 4. Eliminating Spurious ciated with particular grammatical rules (i.e. Ambiguity you may concatenate two elements of category N P and V P, respectively, if there is a rule In this section we outline a subset of the P-calculus, for which efficient processing is 3 To improve readability, we assume possible. As was noted above, in the P-cal- that the operators / and \ take precedence culus there is always a strictly left- over * (X*Y/Z should be read as X*(Y/Z) ). branching derivation for any reading of a

- 22 - sentence S. The restrictions we add in this easily made, since most categorial lexicons section are needed to eliminate spurious am- do not contain the product-operator anyway. biguities from these left-branching deriva- tions. Given this restriction, the inference rule P can be restricted: we require that the left Restricting a parser so that it will only premise of this rules always is an instance of accept left-branching derivations will not either left- or right-application. Consider directly lead to an efficient parsing proce- what would happen if we used I here : dure for the P-calculus. The reason is twofold. (11) BC=>B*C A B*C=>D First, nothing in the P-calculus excludes P spurious ambiguity which occurs within the A*B C => D set of left-branching analysis trees. Con- sider again example (7). This sentence is Since the lexicon is product-free, and we are unambiguous, but nevertheless we can give a interested in strictly left-branching deriva- left-branching derivation for it which dif- tions only, we know that C must be a prod- fers from the one given earlier : uct-free category. If we combine B and C through I, we are faced with the problem in (9) John loves Mary madly ***. At this point we could use I again for in- s/vp vp/np np vp\vp stance, thereby instantiating D as A*(B*C). I But this will lead to a spurious ambiguity, s/vp*vp/np since we know that: I (s/vp*vp/np)* np A*(B*C) E => F iff (A*B)*C E--> F4. (**) s A category (A*B)*C can be obtained by ap- The inference step (**) can be proven to be plying I directly to A*B and C. valid, if we use P' as well as P. If we apply P' at point ***, we find our- An even more serious problem is caused selves trying to find a solution for A B => by the interaction between I and P,P'. E, and then E C => D. But this is nothing else than trying to find a left-branching (10) AB=>A*B A*B C=>D derivation for A,B,C => D, and therefore, p, the inference step in (11) has not led to BC=>B*C A B*C=>D anything new. P A*B C => D In fact, given that the lexicon is product free and only application may be used in the left premise of P, P' is never needed to derive If we try to prove that A*B and C can be re- a left-branching tree. duced to a category D, we could use P, with I in the left premise. To prove the second As a result, we get (12), where we have premise, we could use P', also with using I in made a distinction between reduction rules the left premise. But now the right premise (right and left-application) and other rules. of P' is identical to our initial problem; and This enables us to restrict the left premise thus we have made a useless loop, which of P. The fact that every reduction rule is could even lead to an infinite regress. also a general rule of the grammar, is ex- pressed by R. P' has been eliminated. These problems can be eliminated, if we restrict the grammar in two ways. First of all, we consider only derivations of the form C1,...,Cn ==> S, where C1,...,Cn,S do not contain the product-operator. This means we require that the start symbol of the grammar, and the set lexical categories must be prod- 4 In the P-calculus, this follows from uct-free. Notice that this restriction can be the fact that E must be product-free. It is a theorem of the Lambek-calculus as well.

c,-o - 23 - (12) RA : A/B B -> A sentence like (7) using a shift-reduce pars- LA : B B~A -> A ing technique, and having only right- and left-application as syntax rules. (product) introduction: (1 4) (remaining) input stack I: A B =>A*B s/vp,vp/np,np,vp\vp _ S vp/np,np,vp\vp s/vp inference rules : S np,vp\vp s/vp,vp/np S vp\vp s/vp,vp/np,np P : AB->C, DC=>E R vp\vp s/vp,vp S _ s/vp,vp,vp\vp D*A B => E R _ s/vp,vp R s R : AB->C

AB=>C an element onto the stack (apart form the first one maybe) seems to be equiv- The system in (12) is a subset of the P- alent to combining elements by means of I. calculus, which is able to generate a strictly The stack is after all nothing but a somewhat left-branching derivation for every reading different representation of the product types of a given sentence of the grammar. we used earlier. The fact that adding one el- ement to the stack (vp\vp) induces two re- The Prolog fragment in (13) shows how duction steps, is comparable to the fact that the restricted system in (12) can be used to the inference rule P may have the effect of define a simple left-associative parsing algo- eliminating more than one slash at time. rithm. The similarity between shift-reduce (13) parse([C] ==> C) :- !. parsing and the derivations in P brings in parse([Cl,C21Rest] ==> S) :- another interesting aspect. The shift-reduce rule([C1,C2] ==> C3), algorithm is a correct parsing strategy, be- parse([C31Rest] ==> S). cause it will produce all (syntactic) ambigu- %R: ities for a given input string. This means rule(X --=> Y) :- that in the example above, a shift-reduce reduction_rule(X ==> Y). parser would only produce one syntax tree (assuming that the grammar has only appli- %P: cation). % '+' is used instead of '*' to avoid % unnecessary bracketing If the input was potentially ambiguous, rule([X+Y,Z] ==> W) :- as in (15), there are two different deriva- reduction_rule([Y,Z] ==> V), tions. rule([X,V] ==> W). % I: (15) aJa a a\a rule([X,Y] ==> X+Y). It is after shifting a on the stack that a % application • difference arises. Here, one can either re- reduction_rule([X/Y,Y] ==> X). duce or shift one more step. The first choice reduction_rule([Y,Y\X] ==> X). leads to the left-branching derivation, the second to the right-branching one. 5. Shift-reduce parsing The choice between shifting or reducing It has sometimes been noted that a has a categorial equivalent. In the P-calcu- derivation tree in categorial grammar (such lus, one can either produce a left-branching as (7)) does not really reflect constituent derivation tree for (15) by using application structure in the traditional sense, but that it only, or as indicated in (16). reflects a particular parse process. This may be true for categorial systems in general, but it is particularly clear for the P-calculus. Consider for instance how one would parse a

- 24 - to the grammar (since X can be instantiated, (16) a/a a a\a for instance as s/vp*vp/np). This means I that left-associative derivations are not ~a*a always possible for coordinated sentences. P Our solution to this problem, is to add rules such as (19) to the grammar, which can Note that the P-calculus thus is able to transform certain product-categories into find genuine syntactic (or potentially se- product-free categories. mantic) ambiguities, without producing a different branching phrase structure. The (19) A/(B*C) ~> (A/C)/B correspondence to shift-reduce parsing al- ready suggests this of course, since we A number of such rules are needed to restore should consider the phrase structure pro- left-associativity. duced by a structurally complete grammar much more as a record of the parse process Next to syntactical additions, some than as a constituent structure in the tradi- modifications to the semantic part of the tional sense. inference rule P had to be made, in order to cope with the polymorphic semantics 6. Coordination proposed for coordination by Partee & Rooth (1983). The P-calculus is structurally complete, and therefore, all the arguments that have been presented in favour of a categorial 7. Concluding remarks. analysis of coordination, hold for the P-cal- culus as well. Coordination introduces poly- The spurious ambiguity problem has morphism in the grammar, however, and this been solved in this paper in a rather para- leads to some complications for the re- doxical manner. Whereas Wittenburg (1987) stricted P-calculus presented in (12). tries to do away with ambiguous phrase structure as much as possible (it only arises Adding a category X\(X/X) for coordina- where you need it) and Pareschi & Steedman tors to the P-calculus, enables us to handle (1987) use a chart parsing technique to re- non-constituent conjunction, as is exempli- cover implicit constituents efficiently, the fied in (17) and (18). strategy in this paper has been to go for complete ambiguity. It is in fact this (17) John loves and Peter adores Sue massive ambiguity, which trivializes s/vp vp/np X\(X/X) s/vp vp/np np constituent structure to such an extent that I one might as well ignore it, and choose a con- s/vp*vp/np s/vp*vp/np stituent structure that fits ones purposes best (left-branching in this case). It seems s/vp*vp/np that as far as processing is concerned, the half-way flexible systems of Steedman S (having generalized composition, and heavily restricted forms of raising) are in fact the (18) J. loves Mary madly and Sue wildly hardest case. Simple AB-grammars are in all vp/np np vp\vp X~(X/X)np vp\vp respects similar to CF-grammars, and can di- rectly be parsed by any bottom-up np*vp\vp np*vp\vp algorithm. For strong structurally complete systems such as P, spurious ambiguity can np*vp\vp be eliminated by inspecting left-branching p, trees only. For flexible but not structurally vp complete systems, it is much harder to pre- dict which derivations are interesting and The restricted-calculus of (12) was de- which ones are not, and therefore the only signed to enable efficient left-associative solution is often to inspect all possibilities. parsing. We assumed that lexieal categories would always be product-free, but this as- sumption no longer holds, if we add X\(X/X)

- 25 - Wittenburg, Kent (1987), Predictive Combinators: A Method for Efficient Pro- Ades, Antony & Mark Steedman (1982), On cessing of Combinatory Categorial Grammars. the Order of Words, Linguistics & Phi- Proceedings of the 25th Annual Meeting of losophy 4, 517-518. the Association for Computational Linguis- tics, Stanford University, 73-80. Buszkowski, Wojciech (1988). Generative Power of Categorial Grammars. In R. Oehrle, Zwarts, Frans (1986), Categoriale Grammatica E. Bach, and D. Wheeler (eds.), Categorial en Algebraische Semantiek, Dissertation, Grammars and Natural Language Structures, State University Groningen. Reidel, Dordrecht, 69-94.

Dowty, David (1988), Type Raising, Func- tional Composition, and Non-constituent Conjunction. In R. Oehrle, E. Bach, and D. Wheeler (eds.), Categorial Grammars and Natural Language Structures, Reidel, Dor- drecht, 153-197.

Lambek, Joachim (1958), The mathematics of sentence structure. American Mathematical Monthly 65, 154-170.

Moortgat, Michael (1987), Lambek Categorial Grammar and the Autonomy Thesis. INL working papers 87-03.

Moortgat, Michael (1988), Categorial Inves- tigations : Logical and Linguistic Aspects of the Lambek Calculus. Dissertation, Univer- sity of Amsterdam.

Pareschi, Remo and Mark Steedman (1987), A Lazy Way to Chart-Parse with Categorial Grammars. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford University, 81-88.

Partee, Barbara and Mats Rooth (1983), Gen- eralized Conjunction and Type Ambiguity. In R. Bliuerle, C. Schwarze and A. yon Stechow (eds.) Meaning, Use, and Interpretation of Language, de Oruyter, Berlin, 361-383.

Steedman, Mark (1985), Dependency and Coordination in the Grammar of Dutch and English. Language 61, 523-568.

Wittenburg, Kent (1986), Natural Language Parsing with Combinatory Categorial Gram- mars in a Graph-Unification.Based Formal- ism. Ph. D. dissertation, University of Texas at Austin.

- 26 -