Efficient Tabular LR

Mark-Jan Nederhof Giorgio Satta Faculty of Arts Dipartimento di Elettronica ed Informatica University of Groningen Universit£ di Padova P.O. Box 716 via Gradenigo, 6/A 9700 AS Groningen 1-35131 Padova The Netherlands Italy markj an@let, rug. nl satt a@dei, unipd, it

Abstract of these automata by means of tabulation yield dif- ferent tabular for different such construc- We give a new treatment of tabular LR tions. Another example, on which our presentation parsing, which is an alternative to Tomita's is based, was first suggested by Leermakers (1989): generalized LR . The advantage a grammar is first transformed and then a standard is twofold. Firstly, our treatment is con- tabular algorithm along with some filtering condi- ceptually more attractive because it uses tion is applied using the transformed grammar. In simpler concepts, such as grammar trans- our case, the transformation and the subsequent ap- formations and standard tabulation tech- plication of the tabular algorithm result in a new niques also know as chart parsing. Second- form of tabular LR parsing. ly, the static and dynamic complexity of Our method is more efficient than Tomita's algo- parsing, both in space and time, is signifi- rithm in two respects. First, reduce operations are cantly reduced. implemented in an efficient way, by splitting them in- to several, more primitive, operations (a similar idea 1 Introduction has been proposed by Kipps (1991) for Tomita's al- gorithm). Second, several paths in the computation The efficiency of LR(k) parsing techniques (Sippu that must be simulated separately by Tomita's algo- and Soisalon-Soininen, 1990) is very attractive from rithm are collapsed into a single computation path, the perspective of natural language processing ap- using state minimization techniques. Experiments plications. This has stimulated the computational on practical grammars have indicated that there is linguistics community to develop extensions of these a significant gain in efficiency, with regard to both techniques to general context-free grammar parsing. space and time requirements. The best-known example is generalized LR pars- ing, also known as Tomita's algorithm, described by Our grammar transformation produces a so called Tomita (1986) and further investigated by, for ex- cover for the input grammar, which together with ample, Tomita (1991) and Nederhof (1994a). Des- the filtering condition fully captures the specifica- pite appearances, the graph-structured stacks used tion of the method, abstracting away from algorith- to describe Tomita's algorithm differ very little from mic details such as data structures and control flow. parse fables, or in other words, generalized LR pars- Since this cover can be easily precomputed, imple- ing is one of the so called tabular parsing algorithms, menting our LR parser simply amounts to running among which also the CYK algorithm (Harrison, the standard tabular algorithm. This is very attrac- 1978) and Earley's algorithm (Earley, 1970) can be tive from an application-oriented perspective, since found. (Tabular parsing is also known as chart pars- many actual systems for natural language processing ing.) are based on these kinds of parsing algorithm. In this paper we investigate the extension of LR The remainder of this paper is organized as fol- parsing to general context-free grammars from a lows. In Section 2 some preliminaries are discussed. more general viewpoint: tabular algorithms can of- We review the notion of LR automaton in Section.3 ten be described by the composition of two construc- and introduce the notion of 2LR automaton in Sec- tions. One example is given by Lang (1974) and tion 4. Then we specify our tabular LR method in Billot and Lang (1989): the construction of push- Section 5, and provide an analysis of the algorithm down automata from grammars and the simulation in Section 6. Finally, some empirical results are giv-

239 en in Section 7, and further discussion of our method The application of a transition 81 ~-~ 82 is de- is provided in Section 8. scribed as follows. If the top-most symbols of the stack are 61, then these symbols may be replaced by 2 Definitions 62, provided that either z = e, or z = a and a is the first symbol of the remaining input. Furthermore, if Throughout this paper we use standard formal lan- z = a then a is removed from the remaining input. guage notation. We assume that the reader is famil- Formally, for a fixed PDA .4 we define the bina- iar with context-free grammar parsing theory (Har- ry relation t- on configurations as the least relation rison, 1978). satisfying (881, w) ~- (662, w) if there is a transition A context-free grammar (CFG) is a 4-tuple G = 61 ~ 62, and (881, aw) t- (682, w) if there is a tran- (S, N, P, S), where S and N are two finite disjoint sition 61 a 82. The recognition of a certain input v sets of terminal and nonterminal symbols, respec- is obtained if starting from the initial configuration tively, S E N is the start symbol, and P is a finite for that input we can reach the final configuration set of rules. Each rule has the form A ---* a with by repeated application of transitions, or, formally, A E N and a E V*, where V denotes N U E. The if (qin, v) I"* (q~,, aria, e), where t-* denotes the re- size of G, written I G I, is defined as I; E(A--*a)EP [Aot flexive and transitive closure of b. by I a I we mean the length of a string of symbols a. By a computation of a PDA we mean a sequence We generally use symbols A,B,C,... to range (qi,,v) t- (61,wl) h... t- (6n,wn), n > 0. A PDA is over N, symbols a, b, c,... to range over S, symbols called deterministic if for all possible configurations X, Y, Z to range over V, symbols ~, 8, 7,... to range at most one transition is applicable. A PDA is said over V*, and symbols v, w, z,... to range over S*. to be in binary form if, for all transitions 61 ~L~ 62, We write e to denote the empty string. we have 1611 < 2. A CFG is said to be in binary form if ~ E {e} U V t.J N 2 for all of its rules A --* c~. (The binary 3 Ll:t automata form does not limit the (weak) generative capaci- ty of context-free grammars (Harrison, 1978).) For Let G = (S, N, P, S) be a CFG. We recall the no- technicM reasons, we sometimes use the augment- tion of LR automaton, which is a particular kind ed grammar associated with G, defined as G t = of PDA. We make use of the augmented grammar (St, N t, pt, St), where St, t> and <1 are fresh sym- G t = (st, N t, pt, S t) introduced in Section 2. bols, S t = SU {t>,S<~}. We introduce the function closure from 2 I~'R to 2 ILR A pushdown automaton (PDA) is a 5-tuple .4 = and the function goto from 2 ILR × V to 2 l~rt. For (Z, Q, T, qi,, q/in), where S, Q and T are finite sets any q C ILK, closure(q) is the smallest set such that of input symbols, stack symbols and transitions, re- spectively; qin E Q is the initiM stack symbol and (i) q c closure(q); and q/i, E Q is the finM stack symbol. 1 Each transition (ii) (B --~ c~ • Aft) e closure(q) and (A ~ 7) e pt has the form 61 ~-~ 62, where 61,82 E Q*, 1 < 161 l, together imply (A --* • 7) E closure(q). 1 < 1621 < 2, and z = e or z = a. We generally use symbols q, r, s,... to range over Q, and the symbol We then define 6 to range over Q*. Consider a fixed input string v E ~*. A config- goto(q, X) = uration of the automaton is a pair (6, w) consisting {A ---* ~X • fl I (A --* a • Xfl) E closure(q)}. of a stack 6 E Q* and the remaining input w, which is a suffix of the input string v. The rightmost sym- We construct a finite set T~Lp~ as the smallest collec- bol of 6 represents the top of the stack. The initial tion of sets satisfying the conditions: configuration has the form (qi~, v), where the stack is formed by the initial stack symbol. The final con- (i) {S t ~ t>. S<~} E ~'~Ll=t; and figuration has the form (qi, q/i,, e), where the stack is formed by the final stack symbol stacked upon the (ii) for every q E ~T~LR and X E V, we have initial stack symbol. goto(q, X) E 7~LR, provided goto(q, X) ~ 0.

ZWe dispense with the notion of state, traditionally Two elements from ~Lt~ deserve special attention: incorporated in the definition of PDA. This does not qm = {S t --+ t> * S<~}, and q/in, which is defined to affect the power of these devices, since states can be be the unique set in "~Ll:t containing (S t ~ t>S * <~); encoded within stack symbols and transitions. in other words, q/in = goto(q~n, S).

240 For A • N, an A-redex is a string qoqlq2"" "qm, Nederhof (1994a), and for LR parsing specifically m _> 0, of elements from T~Lrt, satisfying the follow- gipps (1991) and Leermakers (19925).) The follow- ing conditions: ing definition introduces this new kind of automaton. Definition 2 A~R = (~, QLR'I TLR.,! qin, q1~n), where (i) (A ~ a .) • closure(q,,), for some a = q, X1X~. • • • Xm ; and LR ----- 7~LR U ILR, qin = {S t "* I> • S<2}, qJin = goto(qin, S) and TLR contains: (ii) goto(q~_l, Xk) = qk, for 1 < k < m. (i) q ~ q q,, for every a • S and q, q' • 7~Lrt such Note that in such an A-redex, (A --~ • X1Xg.... Xm) that q' = goto(q, a); • closure(qo), and (A ~ X1...Xk * Xk+z'"Xm) E qk, for 0 < k < m. (ii) q A. q (A --* a .), for every q • TiLR and (A The LR automaton associated with G is now in- • ) • closure(q); troduced. (iii) q (A --* aX • ,8) ~ (A ~ a • X,8), for every Definition 1 .ALR = (S, QLR, TLR, qin, q~n), where q • ~LR and (A ~ aX . ,8) • q; QLR "- ~'~LR, qin = {S t -'* t> • S<~}, qlin = goto(qin, S), and TLR contains: (iv) q (A --* * c~) A, q q', for every q, q' • 7~LR and (A ~ ~) • pt such that q' = goto(q, A). (i) q ~ q q', for every a • S and q, q~ • ~LR such that q' = goto(q, a); Transitions in (i) above are again called shift, tran- sitions in (ii) are called initiate, those in (iii) are (ii) q5 ~-L q q', for every A • N, A-redex q~, and called gathering, and transitions in (iv) are called q' • TiLa such that q~ = goto(q, A). goto. The role of a reduce step in .ALR is taken over in .A£K by an initiate step, a number of gathering Transitions in (i) above are called shift, transitions steps, and a goto step. Observe that these steps in- in (ii) are called reduce. volve the new stack symbols (A --~ a • ,8) • ILI~ that are distinguishable from possible stack symbols 4 2LR Automata {A .-* a • ,8} • '/'~LR- The automata .At,rt defined in the previous section We now turn to the second above-mentioned prob- are deterministic only for a subset of the CFGs, lem, regarding the size of set 7dgR. The problem called the LR(0) grammars (Sippu and Soisalon- is in part solved here as follows. The number of Soininen, 1990), and behave nondeterministical- states in 7~Lrt is considerably reduced by identify- ly in the general case. When designing tabular ing two states if they become identical after items methods that simulate nondeterministic computa- A --~ cr • fl from ILrt have been simplified to only tions of ~4LR, two main difficulties are encountered: the suffix of the right-hand side ,8. This is rem- iniscent of techniques of state minimization for fi- • A reduce transition in .ALrt is an elementary op- nite automata (Booth, 1967), as they have been ap- eration that removes from the stack a number plied before to LR parsing, e.g., by Pager (1970) and of elements bounded by the size of the underly- Nederhof and Sarbo (1993). ing grammar. Consequently, the time require- Let G t be the augmented grammar associated ment of tabular simulation of .AL~ computa- with a CFG G, and let I2LI~ -- {fl I (A ---, a,8) e tions can be onerous, for reasons pointed out pt}. We define variants of the closure and 9oto func- by Sheil (1976) and Kipps (1991). tions from the previous section as follows. For any set q C I2Lt~, closurel(q) is the smallest collection of • The set 7~Lrt can be exponential in the size of sets such that the grammar (Johnson, 1991). If in such a case the computations of.ALR touch upon each state, (i) q C elosure'(q); and then time and space requirements of tabular simulation are obviously onerous. (ii) (Aft) e closure' (q) and (A ---* 7) • pt together imply (7) • closure'(q). The first issue above is solved here by re- Also, we define casting .ALR in binary form. This is done by considering each reduce transition as a se- goto'(q, x) = {,8 I (x,8) ~ closure'(q)}. quence of "pop" operations which affect at most two stack symbols at a time. (See also We now construct a finite set T~2Lrt as the smallest Lang (1974), Villemonte de la Clergerie (1993) and set satisfying the conditions:

241 (i) {S, the tabular algorithm. q~. = (S, goto'({S.~}, S)), and T2LR contains: Algorithm 1 Let G = (~,N,P,S) be a CFG in (i) (X,q) ~ (X,q) (a,q'), for every a • Z and binary form, let pred be a function from 2 N to 2 N, (X, q), (a, q') • Q2Lrt such that q' = goto'(q, a); let Ai,,t be the distinguished element from N, and let v = ala2. "'an 6 ~* be an input string. We (ii) (X,q) ~+ (X,q)(e), for every (X,q) • Q2LR compute the least (n+ 1) x (n+ 1) table U such that such that e • closure'(q); Ainit 6 U0,0 and

(iii) (Z,q)(~) ~ (Zg), for every (X,q) • Q2LR (i) A 6 Uj_ 1,j and 19 • q; if (A ~ aj) 6 P, A 6 pred(Uj_l);

(iv) (X,q) (o~) ~ (X,q) (A,q'), for every (X,q), (ii) A 6 Uj,j (A,q') • Q2LR and (A ---~ c~) • pt such that if (A --+ e) 6 P, A E pred(Uj); q' = goto'(q, A). (iii) A 6 Ui,j Note that in the case of a reduce/reduce conflict if B 6 Ui,~, C 6 Uk,j, (A ---. BC) 6 P, A 6 with two grammar rules sharing some suffix in the pred(Ui); right-hand side, the gathering steps of A2Lrt will treat both rules simultaneously, until the parts of (iv) A 6 Uij the right-hand sides are reached where the two rules if B 6 Uij, (A ~ B) 6 P, A 6 pred(UO. differ. (See Leermakers (1992a) for a similar sharing of computation for common suffixes.) The string has been accepted when S 6 U0,,. An interesting fact is that the automaton .A2LR is We now specify a grammar transformation, based very similar to the automaton .ALR constructed for on the definition of .A2LR. a grammar transformed by the transformation rtwo Definition 4 Let A2LR = (S, Q2LR, T2LR, qin,q~,)' given by Nederhof and Satta (1994). 2 be the 2L1% automaton associated with a CFG G. The 2LR cover associated with G is the CFG 5 The algorithm C2I r (G) = ( Q2Lr , P2I rt, where the rules in This section presents a tabular LR parser, which is P2LR are given by: the main result of this paper. The parser is derived (i) (a,q') --* a, from the 2LR automata introduced in the previous for every (X, q) ~-~ (X, q) (a, q') E T2LR; section. Following the general approach presented by Leermakers (1989), we simulate computations of (ii) (e) ~ ¢, 2For the earliest mention of this transformation, we for every (X, q) ~-* (X, q) (e) 6 T2LR; have encountered pointers to Schauerte (1973). Regret- tably, we have as yet not been able to get hold of a copy (iii) (X~) ~ (X, q) (~), of this paper. for every (X, q) (~) ~-* (X~) 6 T2LR;

242 (iv) (A,q') --, (a), labelled A and the trees in list ts as immediate for every (X, q) (or) ~-~ (X, q) (A, q') E T2La. subtrees. Observe that there is a direct, one-to-one correspon- We emphasize that in the third clause above, one dence between transitions of.A2La and productions should not consider more than one q for given k in of C2LR(G). order to prevent spurious ambiguity. (In fact, for The accompanying function pred is defined as fol- fixed X, i, k and for different q such that (X, q) E lows (q, q', q" range over the stack elements): Ui,k, tvee((X, q),i, k) yields the exact same set of trees.) With this proviso, the degree of ambiguity, pred(v) = {q I q'q" ~-~ q E T2La} U i.e. the number of parses found by the algorithm for any input, is reduced to exactly that of the source {q ] q' E r, q' ~*q'qET~La} U grammar. {q I q'Er, q'q"~-~q'qET2La}. A practical implementation would construct the parse trees on-the-fly, attaching them to the table The above definition implies that only the tabular entries, allowing packing and sharing of subtrees (cf. equivalents of the shift, initiate and goto transitions the literature on parse forests (Tomita, 1986; Ell- are subject to actual filtering; the simulation of the lot and Lang, 1989)). Our algorithm actually only gathering transitions does not depend on elements needs one (packed) subtree for several ( X, q) E Ui,k in r. with fixed X,i,k but different q. The resulting Finally, the distinguished nonterminal from the parse forests would then be optimally compact, con- cover used to initialize the table is qin'l Thus we trary to some other LR-based tabular algorithms, as start with (t>, {S 1, where I~kl > 0 for all k, 1 < k < m. for entries in U, is recursively defined by: Informally, we have (6, uw) ~+ (6~', w) if configura- tion (~8', w) can be reached from (6, uw) without the (i) tree((a, q'), i, j) is the set {a}. This set contains bottom-most part 8 of the intermediate stacks being a single Consisting of a single node affected by any of the transitions; furthermore, at labelled a. least one element is pushed on top of 6. The following characterization relates the automa- (ii) is the set {c}. This set consists of an tree(e, i, i) ton .A2Lrt and Algorithm 1 applied to the 2LR cover. empty list of trees. Symbol q E Q~Lrt is eventually added to Uij if and only if for some 6: (iii) tree(Xl?,i,j) is the union of the sets T.(x~),i,j, k where i < k < j, (8) E Uk,j, and there is at (q;n,al...an) ~-* (di, ai+l...an) ~+ (~q, aj+l...an). least one (X, q) E Ui,k and (X~) ---* (X, q) (8) in C2La(G), for some q. For each such k, select In words, q is found in entry Ui,j if and only if, at one such q. We define 7:,( ~X fl ),i,j = {t.ts I t E input position j, the automaton would push some tree((X,q),i,k) A ts E tree(fl, k,j)}. Each t. ts element q on top of some lower-part of the stack is a list of trees, with head t and tail ts. that remains unaffected while the input from i to j is being read. (iv) tree( ( A, q'), i, j) is the union of the sets The above characterization, whose proof is not re- T.( aA,ql ),i,j ' where (~) E Uij is such that ported here, is the justification for calling the result- (A, q') ---* (c~) in C2La(G). We define T ~ - (a,q'),i,j -- ing algorithm tabular LR parsing. In particular, for {glue(A, ts) l ts E tree(c~,i,j)}. The function a grammar for which .A2Lrt is deterministic, i.e. for glue constructs a tree from a fresh root node an LR(0) grammar, the number of steps performed

243 by J42LR and the number of steps performed by the G =(Z,N,P,S) above algorithm are exactly the same. In the case of grammars which are not LR(0), the tabular LR algo- ALGOL 68 ~ rithm is more efficient than for example a backtrack CORRie realisation of -A2LR. Deltra For determining the order of the time complex- Alvey ity of our algorithm, we look at the most expen- sive step, which is the computation of an element (Xfl) E Ui,j from two elements (X, q) e Ui,k and Table 1: The test material: the four grammars and (t3) E Uk,j, through (X, q) (fl) ,--% (Xfl) E T2LR. In some of their dimensions, and the average length of a straightforward realisation of the algorithm, this the test sentences (20 sentences of various length for each grammar). step can be applied O(IT2LRI" Iv 13) times (once for each i, k,j and each transition), each step taking a constant amount of time. We conclude that the time 4 LR A2LR complexity of our algorithm is O([ T2LR] • IV [Z). G space ] time space ] time As far as space requirements are concerned, each set Ui,j or Ui contains at most I O2w.RI elements. ALGOL 68 327 375 234 343 (One may assume an auxiliary table storing each Ui.) CORRie 7548 28028 5131 22414 This results in a space complexity O(I Q2LRI" Iv 12). Deltra 11772 94824 6526 70333 The entries in the table represent single stack ele- Alvey 599 1147 354 747 ments, as opposed to pairs of stack elements follow- ing Lang (1974) and Leermakers (1989). This has Table 2: Dynamic requirements: average space and been investigated before by Nederhof (1994a, p. 25) time per sentence. and Villemonte de la Clergerie (1993, p. 155).

7 Empirical results is little indicative of the actual efficiency of more sophisticated implementations. Therefore, our mea- We have performed some experiments with Algo- surements have been restricted to implementation- rithm 1 applied to ,A2L R and .ALR,~ for 4 practical independent quantities, viz. the number of elements context-free grammars. For ,4 ~LR a cover was used stored in the parse table and the number of elemen- analogous to the one in Definition 4; the filtering tary steps performed by the algorithm. In a practical function remains the same. implementation, such quantities will strongly influ- The first grammar generates a subset of the pro- ence the space and time complexity, although they gramming language ALGOL 68 (van Wijngaarden do not represent the only determining factors. Fur- and others, 1975). The second and third grammars thermore, all optimizations of the time and space generate a fragment of Dutch, and are referred to as efficiency have been left out of consideration. the CORRie grammar (Vosse, 1994) and the Deltra Table 2 presents the costs of parsing the test sen- grammar (Schoorl and Belder, 1990), respectively. tences. The first and third columns give the number These grammars were stripped of their arguments of entries stored in table U, the second and fourth in order to convert them into context-free grammars. columns give the number of elementary steps that The fourth grammar, referred to as the Alvey gram- were performed. mar (Carroll, 1993), generates a fragment of English An elementary step consists of the derivation of and was automatically generated from a unification- one element in QLR! or Q2LR from one or two other based grammar. elements. The elements that are used in the filter- The test sentences have been obtained by au- ing process are counted individually. We give an tomatic generation from the grammars, using the example for the case of .A~R. Suppose we derive an Grammar Workbench (Nederhof and Koster, 1992), element q~ E Ui,j from an element (A -. • c~) E Ui,j, which uses a random generator to select rules; there- warranted by two elements ql,q2 E Ui, ql ~ q2, fore these sentences do not necessarily represent in- through pred, in the presence of ql (A --* • c~) put typical of the applications for which the gram- ql q' e T~.~ and q2 (A ---* • c~) ~-~ q2 q' E T~R. We mars were written. Table 1 summarizes the test ma- then count two parsing steps, one for ql and one for terial. q2. Our implementation is merely a prototype, which Table 2 shows that there is a significant gain in means that absolute duration of the parsing process space and time efficiency when moving from ,4~a to

244 .A LR! ,A2LR G [T~LR[ I [Q[a[ [ [T~R[ In2LRI [ [O2La[ [ IT2Lrd ALGOL 68 434 1,217 13,844 109 724 12,387 CORRie 600 1,741 22,129 185 821 15,569 Deltra 856 2,785 54,932 260 1,089 37,510 Alvey 3,712 8,784 1,862,492 753 3,065 537,852

Table 3: Static requirements.

,A2LR. automata A~,R and A2La. The tabular realisation Apart from the dynamic costs of parsing, we have of the former automata is very close to a variant of also measured some quantities relevant to the con- Tomita's algorithm by Kipps (1991). The objective struction and storage of the two types of tabular LR of our experiments was to show that the automata parser. These data are given in Table 3. ~4~La provide a better basis than .A~a for tabular LR We see that the number of states is strongly re- parsing with regard to space and time complexity. duced with regard to traditional LR parsing. In the Parsing algorithms that are not based on the case of the Alvey grammar, moving from [T~LR[ to LR technique have however been left out of con- ]T~2LR[ amounts to a reduction to 20.3 %. Whereas sideration, and so were techniques for unification time- and space-efficient computation of T~LR for this grammars and techniques incorporating finite-state grammar is a serious problem, computation of T~2La processes. 3 will not be difficult on any modern computer. Al- Theoretical considerations (Leermakers, 1989; so significant is the reduction from [T~R[ to [T2LR[, Schabes, 1991; Nederhof, 1994b) have suggested that especially for the larger grammars. These quanti- for natural language parsing, LR-based techniques ties correlate with the amount of storage needed for may not necessarily be superior to other parsing naive representation of the respective automata. techniques, although convincing empirical data to this effect has never been shown. This issue is dif- 8 Discussion ficult to resolve because so much of the relative ef- Our treatment of tabular LR parsing has two impor- ficiency of the different parsing techniques depends tant advantages over the one by Tomita: on particular grammars and particular input, as well as on particular implementations of the techniques. * It is conceptually simpler, because we make use We hope the conceptual framework presented in this of simple concepts such as a grammar trans- paper may at least partly alleviate this problem. formation and the well-understood CYK al- gorithm, instead of a complicated mechanism Acknowledgements working on graph-structured stacks. The first author is supported by the Dutch Organiza- • Our algorithm requires fewer LR states. This tion for Scientific Research (NWO), under grant 305- leads to faster parser generation, to smaller 00-802. Part of the present research was done while parsers, and to reduced time and space com- the second author was visiting the Center for Lan- plexity of parsing itself. guage and Speech Processing, Johns Hopkins Uni- The conceptual simplicity of our formulation of versity, Baltimore, MD. tabular LR parsing allows comparison with other We received kind help from John Carroll, Job tabular parsing techniques, such as Earley's algo- Honig, Kees Koster, Theo Vosse and Hans de rithm (Earley, 1970) and tabular left-corner pars- Vreught in finding the grammars mentioned in this ing (Nederhof, 1993), based on implementation- paper. Generous help with locating relevant litera- independent criteria. This is in contrast to experi- ture was provided by Anton Nijholt, Rockford Ross, ments reported before (e.g. by Shann (1991)), which and Arnd Ruflmann. treated tabular LR parsing differently from the other techniques. 3As remarked before by Nederhof (1993), the algo- The reduced time and space complexities reported rithms by Schabes (1991) and Leermakers (1989) are not in the previous section pertain to the tabular real- really related to LR parsing, although some notation isation of two parsing techniques, expressed by the used in these papers suggests otherwise.

245 References Nederhof, M.J. and J.J. Sarbo. 1993. Increasing the applicability of LR parsing. In Third Interna- Billot, S. and B. Lang. 1989. The structure of tional Workshop on Parsing Technologies, pages shared forests in ambiguous parsing. In 27th An- 187-201. nual Meeting of the ACL, pages 143-151. Nederhof, M.J. and G. Satta. 1994. An extended Booth, T.L. 1967. Sequential Machines and Au- theory of head-driven parsing. In 32nd Annual tomata Theory. Wiley, New York. Meeting of the ACL, pages 210-217. Carroll, J.A. 1993. Practical unification-based pars- Pager, D. 1970. A solution to an open problem by ing of natural language. Technical Report No. Knuth. Information and Control, 17:462-473. 314, University of Cambridge, Computer Labora- tory, England. PhD thesis. Rekers, J. 1992. Parser Generation for Interactive Environments. Ph.D. thesis, University of Am- Earley, J. 1970. An efficient context-free parsing al- sterdam. gorithm. Communications of the ACM, 13(2):94- 102. Schabes, Y. 1991. Polynomial time and space shift- Harrison, M.A. 1978. Introduction to Formal Lan- reduce parsing of arbitrary context-free gram- guage Theory. Addison-Wesley. mars. In 29th Annual Meeting of the ACL, pages 106-113. Johnson, M. 1991. The computational complexi- ty of GLR parsing. In Tomita (1991), chapter 3, Schauerte, R. 1973. Transformationen von pages 35-42. LR(k)-grammatiken. Diplomarbeit, Universit~it GSttingen, Abteilung Informatik. Kipps, J.R. 1991. GLR parsing in time O(n3). In Tomita (1991), chapter 4, pages 43-59. Schoorl, J.J. and S. Belder. 1990. Computation- al linguistics at Delft: A status report. Report Lang, B. 1974. Deterministic techniques for ef- WTM/TT 90-09, Delft University of Technology, ficient non-deterministic parsers. In Automata, Applied Linguistics Unit. Languages and Programming, 2nd Colloquium, LNCS 14, pages 255-269, Saarbrficken. Springer- Shann, P. 1991. Experiments with GLR and chart Verlag. parsing. In Tomita (1991), chapter 2, pages 17- 34. Leermakers, R. 1989. How to cover a grammar. In 27th Annual Meeting of the ACL, pages 135-142. Sheil, B.A. 1976. Observations on context-free pars- ing. Statistical Methods in Linguistics, pages 71- Leermakers, R. 1992a. A recursive ascent Earley 109. parser. Information Processing Letters, 41(2):87- 91. Sippu, S. and E. Soisalon-Soininen. 1990. Pars- ing Theory, Vol. II: LR(k) and LL(k) Parsing. Leermakers, R. 1992b. Recursive ascent parsing: Springer-Verlag. from Earley to Marcus. Theoretical Computer Science, 104:299-312. Tomita, M. 1986. Efficient Parsing for Natural Lan- guage. Kluwer Academic Publishers. Nederhof, M.J. 1993. Generalized left-corner pars- ing. In Sixth Conference of the European Chapter Tomita, M., editor. 1991. Generalized LR Parsing. of the ACL, pages 305-314. Kluwer Academic Publishers. Nederhof, M.J. 1994a. Linguistic Parsing and Pro- van Wijngaarden, A. et at. 1975. Revised report on gram Transformations. Ph.D. thesis, University the algorithmic language ALGOL 68. Acta Infor- of Nijmegen. matica, 5:1-236. Nederhof, M.J. 1994b. An optimal tabular parsing Villemonte de la Clergerie, E. 1993. Automates algorithm. In 32nd Annual Meeting of the ACL, Piles et Programmation Dynamique -- DyALog: pages 117-124. Une application h la Programmation en Logique. Ph.D. thesis, Universit@ Paris VII. Nederhof, M.J. and K. Koster. 1992. A customized grammar workbench. In J. Aarts, P. de Haan, Vosse, T.G. 1994. The Word Connection. Ph.D. and N. Oostdijk, editors, English Language Cor- thesis, University of Leiden. pora: Design, Analysis and Exploitation, Papers from the thirteenth International Conference on English Language Research on Computerized Cor- pora, pages 163-179, Nijmegen. Rodopi.

246