Modular Logic Grammars
Total Page:16
File Type:pdf, Size:1020Kb
MODULAR LOGIC GRAMMARS Michael C. McCord IBM Thomas J. Watson Research Center P. O. Box 218 Yorktown Heights, NY 10598 ABSTRACT scoping of quantifiers (and more generally focalizers, McCord, 1981) when the building of log- This report describes a logic grammar formalism, ical forms is too closely bonded to syntax. Another Modular Logic Grammars, exhibiting a high degree disadvantage is just a general result of lack of of modularity between syntax and semantics. There modularity: it can be harder to develop and un- is a syntax rule compiler (compiling into Prolog) derstand syntax rules when too much is going on in which takes care of the building of analysis them. structures and the interface to a clearly separated semantic interpretation component dealing with The logic grammars described in McCord (1982, scoping and the construction of logical forms. The 1981) were three-pass systems, where one of the main whole system can work in either a one-pass mode or points of the modularity was a good treatment of a two-pass mode. [n the one-pass mode, logical scoping. The first pass was the syntactic compo- forms are built directly during parsing through nent, written as a definite clause grammar, where interleaved calls to semantics, added automatically syntactic structures were explicitly built up in by the rule compiler. [n the two-pass mode, syn- the arguments of the non-terminals. Word sense tactic analysis trees are built automatically in selection and slot-filling were done in this first the first pass, and then given to the (one-pass) pass, so that the output analysis trees were actu- semantic component. The grammar formalism includes ally partially semantic. The second pass was a two devices which cause the automatically built preliminary stage of semantic interpretation in syntactic structures to differ from derivation trees which the syntactic analysis tree was reshaped to in two ways: [I) There is a shift operator, for reflect proper scoping of modifiers. The third pass dealing with left-embedding constructions such as took the reshaped tree and produced logical forms English possessive noun phrases while using right- in a straightforward way by carrying out modification rezursive rules (which are appropriate for Prolog of nodes by their daughters using a modular system parsing). (2) There is a distinction in the syn- of rules that manipulate semantic items -- consist- tactic formalism between strong non-terminals and ing of logical forms together with terms that de- weak non-terminals, which is important for distin- termine how they can combine. guishing major levels of grammar. The CHAT-80 system (Pereira and Warren, 1982, Pereira, 1983) is a three-pass system. The first I. INTRODUCTION pass is a purely syntactic component using an extrapositJon grammar (Pereira, 1981) and producing l'he term logic grammar will be used here, in syntactic analyses in righ~ost normal form. The the context of natural language processing, to mean second pass handles word sense selection and slot- a logic programming system (implemented normally filling, and =he third pass handles some scoping in P£olog), which associates semantic represent- phenomena and the final semantic interpretation. ations Cnormally in some version of preaicate logic) One gets a great deal of modularity between syntax with natural language text. Logic grammars may have and semantics in that the first component has no varying degrees on modularity in their treatments elements of semantic interpretation at all. of syntax and semantics. Th,ere may or may not be an isolatable syntactic component. In McCocd (1984) a one-pass semantic inter- pretation component, SEM, for the EPISTLE system In writing metamorpilosis grammars (Colmerauer, {Miller, Heidorn and Jensen, 1981) was described. 1978), or definite clause grammars, DCG's, (a spe- SEM has been interfaced both to the EPISTLE NLP cial case of metamorphosis grammars, Pereira and grammar (Heidorn, 1972, Jensen and Heidorn, 1983), Warren. 1980), it is possible to build logical forms as well as to a logic grammar, SYNT, written as a directly in the syntax rules by letting non- DCG by the author. These grammars are purely syn- terminals have arguments that represent partial tactic and use the EPISTLE notion (op. cir.) of logical forms being manipulated. Some of the ear- approximate parse, which is similar to Pereira's ties= logic grammars (e.g., Dahl, 1977) used this notzon of righ~s~ normal form, but was developed approach. There is certainly an appeal in being independently. Thus SYNT/SEM is a two-pass system dicect, but there are some disadvantages in this with a clear modularity between syntax and seman- lack of modularity. One disadvantage is that it tics. seems difficulZ to get an adequate treatment of the 104 In DCG's and extraposition grammars, the matically produced) syntactic analysis trees building of analysis structures .(either logical much more readable and natural linguistically. forms or syntactic trees) must be specified ex- In the absence of shift constructions, these plicitly in the syntax rules. A certain amoun~ of trees are like derivation trees, but only with modularity is then lost, because the grammar writer nodes corresponding to strong non-terminals. must be aware of manipulating these structures, and the possibility of using the grammar in different [n an experimental MLG, the semantic component ways is reduced. [n Dahl and McCord (1983), a logic handles all the scoping phenomena handled by grammar formalism was described, modifier structure that in McCord (1981) and more than the semantic grammars (HSG's), in which structure-building (of component in McCord (1984). The logical form annotated derivation trees) is implicit in the language is improved over that in the previous formalism. MSG's look formally like extraposition systems. grammars, with the additional ingredient that se- mantic items (of the type used in McCord (1981)) The MLG formalism allows for a great deal of modu- can be indicated on the left-hand sides of rules, larity in natural language grammars, because the and contribute automatically to the construction syntax rules can be written with very little of a syntactico-semantic tree much like that in awareness of semantics or the building of analysis HcCord (1981). These MSG's were used interpretively structures, and the very same syntactic component in parsing, and then (essentially) the two-pass can be used in either the one-pass or the two-pass semantic interpretation system of McCord (1981) was mode described above. used to get logical forms. So, totally there were three passes in this system. Three other logic grammar systems designed with modularity in mind are Hirschman and Puder (1982), [n this report, [ wish to describe a logic Abramson (1984) and Porto and Filgueiras (198&). grammar system, modular logic grammars (MLG's), These will be compared with MLG's in Section 6. with the following features: There is a syntax rule compiler which takes care 2. THE MLG SYNTACTIC FORMALISM of the building of analysis structures and the interface to semantic interpretation. The syntactic component for an MLG consists of a declaration of the strong non-terminals, fol- There is a clearly separated semantic inter- lowed by a sequence of MLG syntax rules. The dec- pretation component dealing with scoping and [aration of strong non-terminals is of the form the construction of logical forms. strongnonterminals(NTI.NT2 ..... NTn.nil). The whole system (syntax and semantics) can work optionally in either a one-pass mode or a two- where the NTi are the desired strong non-terminals pass mode. (only their principal functors are indicated). Non-terminals that are not declared strong are In the one-pass mode, no syntactic structures called weak. The significance of the strong/weak are built, but logical forms are built directly distinction will be explained below. during parsing through interleaved calls to the semantic interpretation component, added auto- MLG syntax rules are of the form matically by the rule compiler. A ~---> B in the two-pass mode, the calls to the semantic interpretation component are not interleaved, where A is a non-terminal and B is a rule body. A but are made in a second pass, operating on rule body is any combination of surlCace terminals, syntactic analysis trees produced (automat- logical terminals, goals, shifted non-terminals, ically) in the first pass. non-tprminals, the symbol 'nil', and the cut symbol '/', using the sequencing operator ':' and the 'or' The syntactic formalism includes a t device, symbol 'l' (We represent left-to-right sequencing called the shift operator, for dealing with with a colon instead of a comma, as is often done left-embedding constructions such as English in logic grammars.) These rule body elements are possessive noun phrases ("my wife's brother's Prolog terms (normally with arguments), and they friend's car") and Japanese relative clauses. are distinguished formally as follows. ~ne shift operator instructs the rule compiler to build the structures appropriate for left- A su~e terminal is of the form +A, where A embedding. These structures are not derivation is any Prolog term. Surface terminals corre- trees, because the syntax rules are right-re- spond to ordinary terminals in DCG's (they match cursive, because of the top-down parsing asso- elements of the surface word string), and the ciated with Prolo E. notation is often [A] in DCG's. There is a distinction in the syntactic A logical terminal is of the form 0p-L~, where formalism between strong non-terminals and weak Op is a modification operator and LF is a logical non-terminals, which is important for distin- form. Logical terminals are special cases of guishing major levels of grammar and which semantic items, the significance of which will simplifies the.