AS TREE TRAVERSAL Dale Gerdemann* Seminar ffir Sprachwissenschaft, Universiti t T bingen t

ABSTRACT be eliminated, however, by employing a version of Greibach Normal Form which This paper presents a unified approach is extended to handle argument instan- to parsing, in which top-down, bottom- tiations in definite clause . up and left-corner parsers m:e related The resulting parsers resemble the to preorder, postorder and inorder tree standard Prolog versions of versions of traversals. It is shown that the sim- such parsers. One can then go one step plest bottom-up and left-corner parsers further and partially execute the parser are left recursive and must be con- with respect to a particular --- verted using an extended Greibach nor- as is normally done with definite clause mal form. With further partial exe- gra,,nn~a,'s (Per(,ir~ ~ Warren [JO]). a cution, the bottom-up and left-corner surprising result of this partial execu- parsers collapse togethe~ as in the I]IJP tion is l.ha.t the bottom-up and left- parser of Matsumoto. corner parsers become identical when they are 1)oth partially executed. This 1 INTRODUCTION may explain why the BUP parser of ~/lil.tSllll]OtO eta]. [6] [71 was ,'eferre.d tO In this paper, I present a unified ap- as a bottona-u I) parser even though it proach to parsing, in which top-down, clearly follows a left-corner strategy. bottom-up and left-corner parsers are related to preorder, postorder and in- TREE TRAVERSAL order tree traversals. To some extent, this connection is already clear since PRO G RAM S for each parsing strategy the nodes of Following O'Keefe [8], we can imple- the parse tree are constructed accord- ment i)reorder, postorder and inorder ing to the corresponding tree traversal. tree tra.versals as I)CCs, which will then It is somewhat trickier though, to ac- 1)e converted directly into top-down tually use a tree traversa.l program as ])otl.om-u 1) and heft-corner l)arsers, re- a parser since the resulting pa.rser may spectively. The general schema is: be left recursive. This can

*The research presented in this paper was x ._o d e r(']'t'ee) --* partially sponsored by Teilprojekt Bd "Con- straints on Grammar for Efficient Generation" (x_ordered node labels in Tree). of the Sonderforschungsbereich 340 of the Deutsche Forschungsgemeinschaft. I wouhl Note tha.t in this case, since we are also like to thank Guido Minnen and Dieter most likely to call x_order with the Martini for helpflfl comments. All mistakes are of course my own. Tree va.riable instantiated, we are us- ?KI. Wilhelmstr. 113, D-72074 T(ibingen, ing the DCG in generation mode rather Germany, [email protected]. tha.n as a parser. When used as a parser

396 on the stringlS , the procedure will re- non-bi uary . 1 turn all trees whose x_order traw~rsal produces S. The three, instantiations of % top-down parser this procedure are as ['ollows: td(node(PreTerm,lf(Word))) --> [Word], {word(PreTerm,Word)}. Z preorder traversal td(node(Mother,Left,Right)) --> pre(empty) --> []. {rule(Mother,Left,Right)}, pre(node(Mother,Left,Right)) --> gd(Left), [Mother], td(Right). pre(Left), pre(Right). bottom-up parser bu(node(PreTerm,lf(Word))) --> postorder traversal [Word], post(empty) --> []. {word(PreTerm,Word)}. post(node(Mother,Left,Right)) --> bu(node(Mother,Left,Right)) --> post(Left), bu(Left), post(Right), bu(Right), [Mother]. {rule(Mother,Left,Right)}. inorder traversal Y, left-corner parser in(empty) --> []. ic(node(PreTerm,lf (Word))) --> in(node(Mother,Left,Right)) --> [Word] , in(Left), {word (Pr eTerm, Word) }. [Mother], ic (node (Mother, Left ,Right) ) --> in(Right). ic(Lef%), {rule (Mother, Left, Right) }, 2.1 DIRECT ENCODING OF ic (Right). PARSING STRATEGIES iks seen here the on]y difference be- Analogous to these three tl'aversal pro- tween the t]lree strategies concerns |,he. grams, there are three parsing strage- choice of when to select a struc- gies, which differ from the tree traversal ture rule. 2 Do you start with a. rule and programs in only two respects. First, then try to satisfy it as iu the top-down the base case for a parser should be to apl~roa.ch , or do you parse the (laugh- parse a lexical item rathe,: than to parse t(ers of a. rule. first before selecting the an empty string. And second, in the re- rule as in the bottom-up approach, or cursive clauses, the mother care.gory fits do you l,al(e an inte,'mediate strategy as into the parse tree and is licensed by the in the left-corner al)l)roach. auxiliary rule/3 but it does lq'he only ln'oblematic ease is for left corner not figure into the string that is parsed. since the corresponding tre.e traw~'rsal inorder As was the case for the three tree is normally defined only for bina,'y trees. But traversal programs, the three parsers inorder is easily extended to non-binary trees differ from each other only with respect as follows: i. visit the left daughter in inorder, to the right hand side order. ])'or sim- ii. visit the mot, her, iii. visit the rest; of the. daughters in inorder. plicity, I assume that phrase structure eAs opposed to, say, ~t choice of whether to rules are binary branching, though the use operations of expanding and matching or approach can easily be generalized to operations of and reducing.

397 GREIBACH NORMAL % EGNF bottom-up FORM PARSERS bu(node(PreTerm,lf(Word))) --> [Word], While this approach reflects the logic {word(PreTerm,Word)}. of the top-down, bottom-up and left- bu(Node) --> corner parsers in a clear way, the result- [Word], ing programs are not all usable in Pro- {word(PreTerm,Word)}. log since the bottom-up and the left- b(node(PreTerm,lf(Word)),Node). corner parsers are left-recursive. There exists, however, a general technique for b(L,node(Mother,L,R)) --> removal of left-recursion, namely, con- bu(R), version to Oreibach normal form. The {rule(gother,L,R)}. standard Oreibach normal form conver- b(L,Node) --> sion, however, does not allow for I)CG bu(R), type rules, but we can easily take care {rule(Mother,L,g)}, of the Prolog arguments by a technique b(node(Mother,L,R),Node). suggested by Problem 3.118 of Pereira This, however is not very ef[icient & Shieber [9] to produce what I will since the two clauses of both bu and call Extended Greibach Normal Form b differ only in whether or not there (ECINF). 3 Pereira & Shieber's idea has is a final call to b. ~Ve can reduce been more formally presented in the l.he a.mount of by encod- Generalized Greibaeh Normal Form of ing this optiolmlity in the b procedure Dymetman ([1] [2]), however, the sim- itself. plicity of the parsers here does not jus- tify the extra complication in Dymet- % Improved EGNF bottom-up man's procedure. Using this transfor- bu(Node) --> mation, the bottom-up parser then be- [Word], comes as follows: 4 {word(PreTerm,Word)}, aEGNF is similar to normal GNF except b(node(PreTerm,lf(Word)),Node). that the arguments attached to non-terminals must be manipulated so that the original in- b(Node,Node) --> []. stantiations are preserved. For specific gram- b(L,Node) --> mars, it is pretty e~y to see that such a ma- bu(R), nipulation is possiMe. It is nmch more dif- tlcult (and beyond the scope of this paper) {rule(Mother,L,R)}, to show that there is a general rule tbr such b(node(Mother,L,R),Node). manipulations. 4The Greibach NF conversion introduces l~y tile same I",GNI: transform~Ltion one auxiliary predicate, which (following and improvement, s, tile resulting left- IIopcroft & Ulhnan [4]) I have called b. Of corner parser is only minimally different course, the GNF conversion also does not tell us what to do with the auxiliary procedures in from the bottom-up parser: curly brackets. What I've done here is silnply to put these auxiliary procedures in the trans- formed grammar in positions corresponding to Improved EGNF Left-corner where they occurred in the original grammar. Ic(Node) --> It's not clear that one can always find such a "corresponding" position, though in the case [Word], of the bottom-up and left-corner parsers such {word(PreTerm,Word)}, a position is easy to identify. b(node(PreTerm,lf(Word)),Node).

398 parsers are described in terms of pairs b(Node,Node) --> []. )f op(wations such a.s expand/ma(,c]l, b(L,Node) --> shift/reduce or sprout/nlatch, l{tlt it {rule(Mother,L,g)}, is enl, irely unclear wha.(, expa.nding and Xc(R), matching has to do with shifting, re- b(node(Hother,L,R),Node). ducing or sprouting. By relating pars- ing (.o tree tri~versal, however, it b(:- comes much clearer how these three ap- 4 PARTIAL EXECUTION proac]ms 1,o parsing rcbd;e to each other. This is a natural comparison, since The improved ECNF bottom-np altd clearly t, he l)OSSiloh: orders in which a left-corner parsers (lilIhr now only in the tree can be traversed should not dif position of the auxiliary l)redicate in curly brackets. If this auxiliary pred- f(H' frolll the possible orders in which a parse I,ree can be constructed. ~Vhltt's icate is partially executed out with re- new in this paper, however, is tile idea spect to a particular gramlnar, the two gha.(, such tree traversal programs could pltrsers will become identical. For ex- ample, if we have a rule of the ['orl)l: be translated into p~trsers usillg ex- tended (',reibach Nor,ha.1 Form. s(tree(s,NP,VP)) --> Such a unified approach to parsing is np(RP), mostly useful simply (,o understand how vp(VP). the different l>arsers are related. It is sm'prising Co see, for examph:, that with For either parser, this will result in partial executiol L the bottom-up and one b clause of the form: ]el't-cornc.r parsers be('ome, the same. The similarity bel;weeu t>ot(,om-u 1) and b(np(NP),Node) --> h:ft-corner pa.rsing ha.s caused a certain lc(vp(VP)), all/Ollllt (If (:onI'usion in the literature. b(node(s(tree(s,NP,VP)), l"or example, (,It('.so-calh'd "botton>ui)" np(RP),vp(VP)),Node). chart i)arse.r l)resenl,ed (among other l)laces) in Cazda.r "~ Me.llish [3] in fact This is essentially eqtfivalent to the uses a left-corner strategy. This was kind of rules produced by Matsumoto pointed out by Wiren [ll] but has not et al. ([6] [7])in their "bottom-up" receive(l much attention in the litera- l)arser BUI). s As seen here, Mal, sumo(.o I.ure. It is hoped I.ha.1, the unifi('.d ap- et al were not wrong to call their parser proa.ch to parsing l)re.seifix:d h(:re will bottom-ui) , but they could have just as hel l) 1,o clear u I> ol, her such confusions. well called it left-corner. Finally, one Inight )nentiol)a co)l-- heel.ion to C, ovcrnm('.nt-llinding parsingj 5 CONCLUSION a.s presented ill ,Iolmson & Stabhn' [5]. These a.uthors present a generate amd In most standard presentations, simple test approa.(:h, in which X-bar struc- top-down, bottom-up and h'.ft-corner l, lli'es ~llTe ramlomly generated m~d then tesl, ed agldnst lIB principles. Once (,he aThis rule is not precis('.ly the same as (.he logic of the program is expressed in such rules used in BUP since Matsumoto et al. con> pile their rules a lltth! further to take adv~tll- a ma.uner, cfIi('iency considerations are tage of the first argument and predicate name used in order to fold the testing pro- indexing used in Prolog. cedures into the generation procedure.

399 One could view the strategy takel~ in [7] Yuji Matsumoto. Natwral Lan- this paper as rather similar. Running guage Parsin 9 Systems baaed on a tree traversal program in reverse is Logic Programming. PM) thesis, like randomly generating phrase struc- Kyoto University, 1989. ture. Then these randomly generated structures are tested against the con- Is] Richard O'Keefe. The Craft of straints, i.e., the . Prolog. MIT Press, Cambridge, What I have shown here, is that the de- Mass, 1990. cision as to where to fold in the con- Fernando C. N. Pereira and Stu- straints is very significant. Folding in art Shieber. ProIo 9 and Natural the constraints at different positions ac- Language Analysis. CSLI Lecture tually gives completely different parsing Notes No. 10. Chicago University strategies. Press, Chicago, 1987.

References [10] Fernando C. N. Pereira and David lI. 1). W~m:en. Definite clause [1] Marc Dymetman. A generalized grammars-a surw'.y of the formal- greibach normal form for definit;e ism and a comparison with aug- clause grammars. In COLING-92 mented transition networks. ArliJi- vol. I, pages 366-372, 1992. cial ['ntelligence , 13:231-278, 1980. Also in Grosz et. al., :1986. [2] Marc Dymetman. Tra'asforma- tions de Grammaires logiques. Ap- [11] IVlats \Viren. A comparison of rule- plicatios au probIeThc de la re- invocation strategies in context- versibilite~n Traduclion A~do'ma- free chart parsing. In EACL tique. PhD thesis, Uniw;rsite/le Proceedings, 3lh Annual Meeting, Grenoble, Grenoble, France, 1992. l)ages 226-233, 11987. The.~e d'Etat.

[3] Gerald Gazdar and Chris Mel- lish. Natural Lang~tage Processi.ng in Prolo9. Addison-Wesley, Read- ing, Mass, 1989.

[4] John Itopcroft and .)effrcy lJlhmm. Introduction to Automata 7'h,c- ory and Computation. Addison- Wesley, Reading, Mass, 197!). [5] Mark Johnson and Edward Sta- bler, 1993. Lecture Notes for Course taught at the LSA Summer School in Columbus Ohio.

[6] Y. Matsumoto, H. tIirakawa., I{ Miyoshi, and I1 Yasukawa. Bup: A bottom-up parser embedded in prolog. New Ceneration Comp~tl- ing, 1(2):145-158, 11983.

400