PHRASE STRUCTURE GRAMMARS AND NATURAL LANGUAGES Gerald Gazdar Cognitive Studies Programme University of Sussex Brighton BN1 9QN ABSTRACT kind of context-sensitivity in natural language. ... The best strategy seems to be During most of the last two decades, computational to take care of the particular types of linguists and AI researchers working on natural context-sensitivity recognized by linguistic language have assumed that phrase structure theory by means of special procedures which grammars, despite their computational act as a superstructure of the algorithm for tractability, were unsatisfactory devices for context-free analysis. ... In addition to the expressing the syntax of natural languages, above-mentioned drawbacks a context-free however, during the same period, they have come to phrase structure grammar has difficulty realize that transformational grammars, whatever handling word order variation in a natural their linguistic merits, are computationally way. intractable as they stand. The assumption, (welin 1979: 62-63) unchallenged for many years, that PSG's were inadequate for natural languages is based on One significant -use of the general context- arguments originally advanced by transformational free methods is as part of a system of linguists in the late 1950»s and early 1960's. processing natural languages such as English. but recent work has shown that none of those We are not suggesting that there is a arguments were valid. The present paper draws on context-free grammar for English. It is that work to argue that (i) there is no reason, at probably more appropriate to view the the present time, to think that natural languages grammar/parser as a convenient control are not context-free languages, (ii) there are structure for directing the analysis of the good reasons to think that the notations needed to input string. The overall analysis is capture significant syntactic generalizations will motivated by a linguistic model which is not characterize phrase structure grammars or some context free, but which can frequently make minor generalization of them, and (iii) there are use of structures determined by the context- good reasons for believing that such grammars, and free grammar. the monostratal representations they induce, (Graham, Harrison & Ruzzo 1980: 415-116) provide the necessary basis for the semantic interpretation of natural languages. If these These two passages have a number of things in arguments are valid, then the prospects for a common, of which three are relevant here. fruitful interaction between theoretical Firstly, the issue of whether natural languages linguistics and AI are much brighter than they (NL's) are context-free languages (CFL's) and are would otherwise be. susceptible to analysis by context-free phrase structure grammars (CF-PSG's) is one the authors I INTRODUCTION take to be relevant to parsing. Secondly, both passages assume that this issue has already been Consider the following quotations: resolved, and resolved in the negative. Thirdly, I did not have to look for them, I merely bumped As already mentioned, a context-free phrase into them, as it were, in the course of recent structure grammar is not sufficient to reading. However, I am sure that if I had been in describe or analyze the whole range of the business of finding passages with this kind of syntactic constructions which occur in natural flavour in the parsing literature of the past 20 language texts (cf. Chomsky 1957). Even if years, then I could have found dozens, probably one disregards the theoretical linguist's hundreds. demand for satisfactory descriptions of the syntactic structure of sentences, there are The purpose of the present paper is simply to draw strong reasons to design a more powerful the attention of computational linguists to the parser than the one described above. fact that the issue of the status of NL's with Cases in point are the phenomenon of agreement respects to the CFL's and CF-PSG's is not within noun phrases, the correspondence in the resolved, and to the fact that all the published verb phrase between the form of the main verb arguments seeking to establish that NL's are not and the type of the auxiliary, and subject CFL's, or that CF-PSG's are not adequate for the verb agreement. The argument structure of analysis of NL's, are completely without force. predicates, that is, their various types of Of course, this does not entail that NL's are objects and complements, represents another CFL's or that CF-PSG's constitute the appropriate G. Gazdar 557 formal theory of NL grammars. But it does have as such as they are, are relevant to semantics rather a consequence that computational linguists should than syntax in any case, (d) provides no basis for not just give up on CF-PSG's on the grounds that any string set argument [1], and (e) Postal theoretical linguistics has demonstrated their crucially failed to take account of one class of inadequacy. No such demonstration exists. permissible incorporations - once these are recognized, the formal basis of his argument In assessing whether some formal theory of grammar collapses. is an adequate theory for NL's, at least the following three criteria are relevant, and have Thus, Pullum & Gazdar (1982) demonstrate that been historically. (i) Does it permit NL's qua every published argument purporting to show that sets of strings to be generated? (ii) Does it one or another NL is not a CFL is invalid, either permit significant generalizations to be formally, or empirically, or both. Whether any expressed? (iii) Does it support a semantics, NL, construed as a string set, falls outside the that is, does it provide a basis on which meanings class of CFLs remains an open question, just as it can be assigned to NL expressions in a was twenty five years ago. satisfactory manner? Ill CAPTURING SIGNIFICANT GENERALIZATIONS In the remainder of this paper, I shall consider these three criteria in turn with reference to the Argumentation purporting to show that CF-PSG's adequacy of CF-PSG's as grammars for NL's. The will miss significant generalizations about some issues are large, and space is limited, so my NL phenomenon has been woefully inadequate. discussion will take the form, for the most part, Typically it consists simply of providing or of annotated references to the literature where alluding to some CF-PSG which obviously misses the the various issues are properly dealt with. generalization in question. But, clearly, nothing whatever follows from such an exhibition. Any II GENERATING NATURAL LANGUAGE STRING SETS framework capable of handling some phenomenon at all will typically make available indefinitely The belief that CF-PSG's cannot cope with many ugly analyses of the phenomenon. But this syntactic concord and long-distance dependencies, fact is neither surprising nor interesting. What and hence that NL's are not CFL's, but, say, is surprising, and rather disturbing, is that properly context-sensitive languages, is well arguments of this kind (beginning, classically, in entrenched. One textbook goes so far as to assert chapter 5 of Chomsky 1957) have been taken so that 'the grammatical phenomenon of Subject seriously for so long. Predicate Agreement is sufficient to guarantee the accuracy of [the statement that] English is not a Capturing significant generalizations is largely a CF-PSG language' (Grinder & Elgin 1973: 59). The matter of notation. But CF-PSG's, taken as a phenomenon guarantees no such thing, of course. class of mathematical objects, have properties Nor is the character of the problem changed when which are theirs independently of the notations agreement is manifested across unbounded distances that might be used to define them. Thus they in strings (pace Bach 1974: 77, Bresnan 1978: 38). determine a certain set of string sets, they Indeed, finite state languages can exhibit such determine a certain set of tree sets, they stand dependencies (see Pullum & Gazdar, 1982). in particular equivalence relations, and so on. An analogy from logic is pertinent here: the truth The introductory texts and similar expository function material implication Just is material works in the field of generative grammar offer implication whether you notate it with an arrow, nothing that could be taken seriously as an or a hook, or the third letter of the alphabet, argument that NL's are not CFL's. However, five and whether you use prefix, infix, or postfix putatively non-specious arguments to this effect positioning of the symbol. are to be found in the more technical literature. These are based on the following phenomena: Over its 25 year history, transformational grammar developed a whole armoury of linguistically useful (a) English comparative clauses notations, and many of these can Just as well be (Chomsky 1963: 378-9), used in characterizing CF-PSG's. Three such (b) the decimal expansion of pi notational devices merit individual mention: (a) (Elster 1978: 43-44), complex symbols, (b) rule schemata, and (c) (c) 'respectively' mappings from one set of rules into another (Bar-Hillel & Shamir 1960: 96, (metarules). Langendoen 1977: 4-5), (d) Dutch subordinate clauses Harman (1963) deserves the credit for first seeing (Huybregts 1976), the potential of PSG's incorporating complex (e) Mohawk noun incorporation symbols. The use of a finite set of complex (Postal 1964). symbols, in place of the traditional finite set of monadic symbols, leaves the mathematical Pullum & Gazdar (1982) show that (a) is based on a properties of CF-PSG's unchanged. Every CF-PSG false empirical claim and a false claim about employing complex symbols generates a tree set formal languages, (b) has no bearing on English or that is isomorphic to the tree set generated by any other natural language since it depends on a some CF-PSG not employing complex symbols.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages17 Page
-
File Size-