PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar Ffir Sprachwissenschaft, Universiti T T Bingen T

PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar ffir Sprachwissenschaft, Universiti t T bingen t ABSTRACT be eliminated, however, by employing a version of Greibach Normal Form which This paper presents a unified approach is extended to handle argument instan- to parsing, in which top-down, bottom- tiations in definite clause grammars. up and left-corner parsers m:e related The resulting parsers resemble the to preorder, postorder and inorder tree standard Prolog versions of versions of traversals. It is shown that the sim- such parsers. One can then go one step plest bottom-up and left-corner parsers further and partially execute the parser are left recursive and must be con- with respect to a particular grammar--- verted using an extended Greibach nor- as is normally done with definite clause mal form. With further partial exe- gra,,nn~a,'s (Per(,ir~ ~ Warren [JO]). a cution, the bottom-up and left-corner surprising result of this partial execu- parsers collapse togethe~ as in the I]IJP tion is l.ha.t the bottom-up and left- parser of Matsumoto. corner parsers become identical when they are 1)oth partially executed. This 1 INTRODUCTION may explain why the BUP parser of ~/lil.tSllll]OtO eta]. [6] [71 was ,'eferre.d tO In this paper, I present a unified ap- as a bottona-u I) parser even though it proach to parsing, in which top-down, clearly follows a left-corner strategy. bottom-up and left-corner parsers are related to preorder, postorder and in- TREE TRAVERSAL order tree traversals. To some extent, this connection is already clear since PRO G RAM S for each parsing strategy the nodes of Following O'Keefe [8], we can imple- the parse tree are constructed accord- ment i)reorder, postorder and inorder ing to the corresponding tree traversal. tree tra.versals as I)CCs, which will then It is somewhat trickier though, to ac- 1)e converted directly into top-down tually use a tree traversa.l program as ])otl.om-u 1) and heft-corner l)arsers, re- a parser since the resulting pa.rser may spectively. The general schema is: be left recursive. This left recursion can *The research presented in this paper was x ._o r d e r(']'t'ee) --* partially sponsored by Teilprojekt Bd "Con- straints on Grammar for Efficient Generation" (x_ordered node labels in Tree). of the Sonderforschungsbereich 340 of the Deutsche Forschungsgemeinschaft. I wouhl Note tha.t in this case, since we are also like to thank Guido Minnen and Dieter most likely to call x_order with the Martini for helpflfl comments. All mistakes are of course my own. Tree va.riable instantiated, we are us- ?KI. Wilhelmstr. 113, D-72074 T(ibingen, ing the DCG in generation mode rather Germany, [email protected]. tha.n as a parser. When used as a parser 396 on the stringlS , the procedure will re- non-bi uary branching. 1 turn all trees whose x_order traw~rsal produces S. The three, instantiations of % top-down parser this procedure are as ['ollows: td(node(PreTerm,lf(Word))) --> [Word], {word(PreTerm,Word)}. Z preorder traversal td(node(Mother,Left,Right)) --> pre(empty) --> []. {rule(Mother,Left,Right)}, pre(node(Mother,Left,Right)) --> gd(Left), [Mother], td(Right). pre(Left), pre(Right). bottom-up parser bu(node(PreTerm,lf(Word))) --> postorder traversal [Word], post(empty) --> []. {word(PreTerm,Word)}. post(node(Mother,Left,Right)) --> bu(node(Mother,Left,Right)) --> post(Left), bu(Left), post(Right), bu(Right), [Mother]. {rule(Mother,Left,Right)}. inorder traversal Y, left-corner parser in(empty) --> []. ic(node(PreTerm,lf (Word))) --> in(node(Mother,Left,Right)) --> [Word] , in(Left), {word (Pr eTerm, Word) }. [Mother], ic (node (Mother, Left ,Right) ) --> in(Right). ic(Lef%), {rule (Mother, Left, Right) }, 2.1 DIRECT ENCODING OF ic (Right). PARSING STRATEGIES iks seen here the on]y difference be- Analogous to these three tl'aversal pro- tween the t]lree strategies concerns |,he. grams, there are three parsing strage- choice of when to select a phrase struc- gies, which differ from the tree traversal ture rule. 2 Do you start with a. rule and programs in only two respects. First, then try to satisfy it as iu the top-down the base case for a parser should be to apl~roa.ch , or do you parse the (laugh- parse a lexical item rathe,: than to parse t(ers of a. rule. first before selecting the an empty string. And second, in the re- rule as in the bottom-up approach, or cursive clauses, the mother care.gory fits do you l,al(e an inte,'mediate strategy as into the parse tree and is licensed by the in the left-corner al)l)roach. auxiliary predicate rule/3 but it does lq'he only ln'oblematic ease is for left corner not figure into the string that is parsed. since the corresponding tre.e traw~'rsal inorder As was the case for the three tree is normally defined only for bina,'y trees. But traversal programs, the three parsers inorder is easily extended to non-binary trees differ from each other only with respect as follows: i. visit the left daughter in inorder, to the right hand side order. ])'or sim- ii. visit the mot, her, iii. visit the rest; of the. daughters in inorder. plicity, I assume that phrase structure eAs opposed to, say, ~t choice of whether to rules are binary branching, though the use operations of expanding and matching or approach can easily be generalized to operations of shifting and reducing. 397 GREIBACH NORMAL % EGNF bottom-up FORM PARSERS bu(node(PreTerm,lf(Word))) --> [Word], While this approach reflects the logic {word(PreTerm,Word)}. of the top-down, bottom-up and left- bu(Node) --> corner parsers in a clear way, the result- [Word], ing programs are not all usable in Pro- {word(PreTerm,Word)}. log since the bottom-up and the left- b(node(PreTerm,lf(Word)),Node). corner parsers are left-recursive. There exists, however, a general technique for b(L,node(Mother,L,R)) --> removal of left-recursion, namely, con- bu(R), version to Oreibach normal form. The {rule(gother,L,R)}. standard Oreibach normal form conver- b(L,Node) --> sion, however, does not allow for I)CG bu(R), type rules, but we can easily take care {rule(Mother,L,g)}, of the Prolog arguments by a technique b(node(Mother,L,R),Node). suggested by Problem 3.118 of Pereira This, however is not very ef[icient & Shieber [9] to produce what I will since the two clauses of both bu and call Extended Greibach Normal Form b differ only in whether or not there (ECINF). 3 Pereira & Shieber's idea has is a final call to b. ~Ve can reduce been more formally presented in the l.he a.mount of backtracking by encod- Generalized Greibaeh Normal Form of ing this optiolmlity in the b procedure Dymetman ([1] [2]), however, the sim- itself. plicity of the parsers here does not jus- tify the extra complication in Dymet- % Improved EGNF bottom-up man's procedure. Using this transfor- bu(Node) --> mation, the bottom-up parser then be- [Word], comes as follows: 4 {word(PreTerm,Word)}, aEGNF is similar to normal GNF except b(node(PreTerm,lf(Word)),Node). that the arguments attached to non-terminals must be manipulated so that the original in- b(Node,Node) --> []. stantiations are preserved. For specific gram- b(L,Node) --> mars, it is pretty e~y to see that such a ma- bu(R), nipulation is possiMe. It is nmch more dif- tlcult (and beyond the scope of this paper) {rule(Mother,L,R)}, to show that there is a general rule tbr such b(node(Mother,L,R),Node). manipulations. 4The Greibach NF conversion introduces l~y tile same I",GNI: transform~Ltion one auxiliary predicate, which (following and improvement, s, tile resulting left- IIopcroft & Ulhnan [4]) I have called b. Of corner parser is only minimally different course, the GNF conversion also does not tell us what to do with the auxiliary procedures in from the bottom-up parser: curly brackets. What I've done here is silnply to put these auxiliary procedures in the trans- formed grammar in positions corresponding to Improved EGNF Left-corner where they occurred in the original grammar. Ic(Node) --> It's not clear that one can always find such a "corresponding" position, though in the case [Word], of the bottom-up and left-corner parsers such {word(PreTerm,Word)}, a position is easy to identify. b(node(PreTerm,lf(Word)),Node). 398 parsers are described in terms of pairs b(Node,Node) --> []. c)f op(wations such a.s expand/ma(,c]l, b(L,Node) --> shift/reduce or sprout/nlatch, l{tlt it {rule(Mother,L,g)}, is enl, irely unclear wha.(, expa.nding and Xc(R), matching has to do with shifting, re- b(node(Hother,L,R),Node). ducing or sprouting. By relating parsing (.o tree tri~versal, however, it b(:- comes much clearer how these three ap- 4 PARTIAL EXECUTION proac]ms 1,o parsing rcbd;e to each other. This is a natural comparison, since The improved ECNF bottom-np altd clearly t, he l)OSSiloh: orders in which a left-corner parsers (lilIhr now only in the tree can be traversed should not dif position of the auxiliary l)redicate in curly brackets. If this auxiliary pred- f(H' frolll the possible orders in which a parse I,ree can be constructed. ~Vhltt's icate is partially executed out with re- new in this paper, however, is tile idea spect to a particular gramlnar, the two gha.(, such tree traversal programs could pltrsers will become identical.

PARSING AS TREE TRAVERSAL Dale Gerdemann* Seminar Ffir Sprachwissenschaft, Universiti T T Bingen T

Syntax-Directed Translation, Parse Trees, Abstract Syntax Trees

Derivatives of Parsing Expression Grammars

Parsing 1. Grammars and Parsing 2. Top-Down and Bottom-Up Parsing 3

Abstract Syntax Trees & Top-Down Parsing

Hardware Pattern Matching for Network Traffic Analysis in Gigabit

Compiler Construction

Advanced Parsing Techniques

Lecture 3: Recursive Descent Limitations, Precedence Climbing

Top-Down Parsing & Bottom-Up Parsing I

Comprehensive Examinations in Computer Science 1872 - 1878

Parse Trees (=Derivation Tree) by Constructing a Derivation

Technical Correspondence Techniques for Automatic Memoization with Applications to Context-Free Parsing