Efficient Processing of Flexible Categorial Grammar

EFFICIENT PROCESSING OF FLEXIBLE CATEGORIAL GRAMMAR Gosse Bouma Research Institute for Knowledge Systems Postbus 463, 6200 AL Maastricht The Netherlands e-mail(earn): exriksgb@hmarl5 ABSTRACT* (i) application : A/B B ==> A B B\A ==> A From a processing point of view, however, composition: A/B B/C ==> NC flexible categorial systems are problematic, C\B B~, ==> C~ since they introduce spurious ambiguity. In raising : A ==> (B/A)\B this paper, we present a flexible categorial A ==> B/(A\B) grammar which makes extensive use of the product-operator, first introduced by With this grammar many alternative con- Lambek (1958). The grammar has the prop- stituent structures for a sentence can be erty that for every reading of a sentence, a generated, even where this does not corre- strictly left-branching derivation can be spond to semantic ambiguities. From a lin- given. This leads to the definition of a subset guistic point of view, this has many advan- of the grammar, for which the spurious ambi- tages. Various kind of arguments for giving guity problem does not arise and efficient up traditional conceptions of constituent processing is possible. structure can be given, but the most con- vincing and well-documented case in favour of flexible constituent structure is coordi- 1. Flexibility vs. Ambiguity nation (see Steedman (1985), Dowty (1988), and Zwarts (1986)). Categorial Grammars owe much of their popularity to the fact that they allow for The standard assumption in generative various degrees of flexibility with respect to grammar is that coordination always takes constituent structure. From a processing place between between constituents. Right- point of view, however, flexible categorial node raising constructions and other in- systems are problematic, since they intro- stances of non-constituent conjunction are duce spurious ambiguity. problematic, because it is not clear what the status of the coordinated elements in these The best known example of a flexible constructions is. Flexible categorial gram- categorial grammar is a grammar containing mar presents an elegant solution for such the reduction rules application and compo- cases, since, next to canonical constituent sition, and the category changing rule rais- structures, it also admits various other con- ing 1 • stituent structures. Therefore, the sentences in (2) can be considered to be ordinary in- stances of coordination (of two categories slap and (vp/ap)\vp, respectively). * I would like to thank Esther K0nig, Erik-Jan van der Linden, Michael Moortgat, (2) a. John sold and Mary bought a book Adriaan van Paassen and the participants of s/vp vp/np s/vp vp/np np the Edinburgh Categorial Grammar Weekend, who made useful comments to earlier s/np s/np presentations of this material. All remaining errors and misconceptions are of course my own. 1 Throughout this paper we will be us- ing the notation of Lambek (1958), in which left-directional functor respectively, looking A/B and B\A are a right-directional and a for an argument of category B. - 19- tory rules in the grammar. Some careful rewriting of the original grammar has to take b. J. loves Mary madly and Sue wildly place, before things work as desired. vp/np np vp\vp np vp\vp Pareschi & Steedman (1987) propose an Mary madly efficient chart-parsing algorithm for catego- np vp\vp rial grammars with spurious ambiguity. In- stead of the usual strategy, in which all pos- (vplnp)\vp sible subconstituents are added to the chart, Pareschi & Steedman restrict themselves to (vp/np)Wp adding only those constituents that may lead to a difference in semantics. Thus, in (3) A somewhat different type of argument only the underlined constituents are in the for flexible phrase structure is based on the chart. The "---" constituent is not. way humans process natural language. In Ades & Steedman (1982) it is pointed out (3) John loves Mary madly that humans process natural language in a s/vp vp/np np vp\vp left-to-right, incremental, manner. This processing aspect is accounted for in a flexible categorial system, where constituents can be built for any part of a sentence. Since syntactic rules operate in parallel with semantic Combining 'madly' with the rest would be interpretation rules, building a syntactic impossible or lead to backtracking in the structure for an initial part of a sentence, normal case. Here, the Pareschi & Steedman implies that a corresponding semantic struc- algorithm starts looking for a constituent ture can also be constructed. left adjacent of madly, which contains an el- ement X/vp as a leftmost category. If such a These and other arguments suggest that constituent can be found, it can be concluded there is no such thing as a fixed constituent that the rest of that constituent must structure, but that the order in which ele- (implicitly) be a v p, and thus the validity of ments combine with eachother is rather free. combining vp\vp with this constituent has been established. Therefore, Pareschi & From a parsing point of view, however, Steedman are able to work with only a mini- flexibility appears to be a disadvantage. mal amount of items in the chart. Flexible categorial grammars produce large numbers of, often semantically equivalent, Both Wittenburg and Pareschi & Steed- derivations for a given phrase. This spurious man work with categorial grammars, which ambiguity problem (Wittenburg (1986)) contain restricted versions of composition makes efficient processing of flexible catego- and raising. Although they can be processed rial grammar problematic, since quite often efficiently, there is linguistic evidence that there is an exponential growth of the number they are not fully adequate for analysis of of possible derivations, relative to the length such phenomena as coordination. Since of the string to be parsed. atomic categories can in general not be raised in these grammars, sentence (2b) (in There have been two proposals for which the category n p has to be raised) eliminating spurious ambiguity from the cannot be derived. Furthermore, since grammar. The first is Wittenburg (1987). In composition is not generalized, as in Ades & this paper, a categorial grammar with compo- Steedman (1982), a sentence such as John sition and heavily restricted versions of sold but Mary donated a book to the library raising (for subject n p's only) is considered. would not be derivable. The possibilities for Wittenburg proposes to eliminate spurious left-to-right, incremental, processing are ambiguity by redefining composition. His also limited. Therefore, there is reason to predictive composition rules apply only in look for a more flexible system, for which those cases where they are really needed to efficient parsing is still possible. make a derivation possible. A disadvantage of this method, noticed by Wittenburg, is that one may have to add special predictive composition rules for all general combina- - 20 - 2. Structural Completeness (5) Strong Structural Complete- ness In the next section we present a gram- If a sequence of categories XI .. Xn matical calculus, which is more flexible than reduces to Y, with semantics Y', the systems considered by Wittenburg there is a reduction to Y, with se- (1987) and Pareschi & Steedman (1987), and mantics Y', for any bracketing of therefore is attractive for linguistic pur- XI..Xn into constituents. poses. At the same time, it offers a solution to the spurious ambiguity problem. Grammars with this property, can poten- Spurious ambiguity causes problems for tially circumvent the spurious ambiguity parsing in the systems mentioned above, be- problem, since for these grammars, we only cause there is no systematic relationship have to inspect all left-branching syntax between syntactic structures and semantic trees, to find all possible readings. This representations. That is, there is no way to method will only fail if the set of left- identify in advance, for a given sentence S, a branching trees itself would contain spuri- proper subset of the set of all possible syn- ous ambiguous derivations. In section 4 we tactic structures and associated semantic will show that these can be eliminated from representations, for which it holds that it the calculus presented below. will contain all possible semantic representations of S. 3. The P-calculus Consider now a grammar for which the The P(roduct)-calculus is a categorial following property holds: grammar, based on Lambek (1958), which has the property of strong structural com- (4) Structural Completeness pleteness. If a sequence of categories X1 .. X n reduces to Y, there is a reduction to In Lambek (1958), the foundations of Y for any bracketing of X1..Xn into flexible categorial grammar are formulated constituents. (Moortgat, 1987:5) 2 in the form of a calculus for syntactic categories. Well-known categorial rules, such as Structural complete grammars are interest- application, composition and category-raising linguistically, since they are able to ing, are theorems of this calculus. A largely handle, for instance, all kinds of non-con- neglected aspect of this calculus, is the use stituent conjunction, and also because they of the product-operator. allow for strict left-to-right processing (see Moortgat, 1988). The calculus we present below, was developed as an alternative for Moortgat's The latter observation has consequences (1988) M-system. The M-system is a subset for parsing as well,, since, if we can parse of the Lambek-calculus, which uses, next to every sentence in a strict left-to-right man- application, only a very general form of ner (that is, we produce only strictly left- composition. Since it has no raising, it seems branching syntax trees), the parsing algo- to be an attractive candidate for investigat- rithm can be greatly simplified. Notice, ing the possibilities of left-associative however, that such a parsing strategy is only parsing for categorial grammar.

Efficient Processing of Flexible Categorial Grammar

A Type-Theoretical Analysis of Complex Verb Generation

Development of a General-Purpose Categorial Grammar Treebank

Direct Compositionality and Variable-Free Semantics: the Case of “Principle B”

Chapter 1 Negation in a Cross-Linguistic Perspective

1 English Subjectless Tagged Sentences Paul Kay Department Of

Categorial Grammar and Lexical-Functional Grammar∗

The Categorial Fine-Structure of Natural Language

Chapter 3 Language Analysis and Understanding

The Logic of Categorial Grammars: Lecture Notes Christian Retoré

An Experimental Approach to Linguistic Representation

Categorial Grammars

Introduction to Categorial Grammar