cmp-lg/9405025 26 May 94 as has recognized the rst symbol b oth to the timecomplexityamounts and to much the nondeterminism which spacecomplexity is detrimental b egin with the samegrammar few contains grammars symbols eg vantage very ecient algorithm related to Tomitas ecient which cannot handle leftoften recursion used and as is an generallyat alternative the less to same topdown time TDSince is parsing LC able parsing to is deal a with very left simple recursion parsing technique it and is where of lo okahead can handle the class of LC which In this pap er we relate a number of parsing puter science has b een used in dierent guises in various areas of com Leftcorner represent the same sub derivation rithms parsing search NWO under grant p ossible prop erty that theing table namely will how to contain obtain asquestion a which little parsing has entries algorithm come with up as of the others in the areap of ortunity to tabular improve pars some algorithms based on features underlying ideas We show that these Nondeterministic By relating existing ideas we hop e to provide an op  next Supp orted by the Dutch Organisation for Scientic Re AN have tabular theory however step is but A second purp ose of this pap er is to answer a and not LC b een OPTIMAL without predict Deterministic LC parsing with Earleys the and algorithms A parsing developed which Introduction many rules empty string LC algorithms are based on the same which Abstract all the University of Nijmegen Department of parsing is algorithm b ecomes 1 aforementioned is j p ossibility orooiveldToerno include and in a 2 parsing whose X very j a noticeable of such an the After deterministic TABULAR parallel It dierent that righthand foundation of a strategy has k an grammars rules two MarkJan one algorithm LC k when markjancskunnl areas symbols it will entries parser disad which algo sides This ED the of Nijmegen The Netherlands Nederhof PARSING tive in case we havethe eg lefthand the sides rules of the rules are the same neous pro cessing where again then a In case the stackthe algorithm is grammar very can nondeterministic b e for handled almost deterministically of sub derivations that each state representsalization the computation This of can a bnumber e number of informally explained entries by increases the the the fact number of states may lead to an increasing as a tabular realisation ofpaths nondeterministic LR parsing allows sharing of computation b etween dierent search cussed parsing algorithms mentioned ab ove have not b een dis each tabular parsing which canand b e ELR describ parsing ed are as obvious follows solutions to a problem of lead purp ose symbols realisation tions b etween these algorithms then rithm represent b een describ ed ingorithm their from own the right existingof literature tabular but parsing they algorithms havemon and not prexes a parallel parsinggorithms al capable of simultaneous pro cessingELR of and all commonprex com CPtherefore parsing go which one are step al further and discuss extended LR of its tabular realisation A rst attempt to solve this problem is to use predic certain At A To In LR second to PLR parsing will not improve the general parsing it this the For example Tomitas algorithm can b e seen which together ecient is PLR of o ccurring Computer Science b est an grammar p oint  inevitable using this purp ose algorithm working overlapping p owerful also ELR and CP parsing are the foundation is not the empty string but now of parsing pap er we in tabular a of a common prex the If the number of states o ccur parse on the use however of this ALGORITHM that authors is the algorithms working existing the parse table of the tabular re parsing to as PLR table collection at stack term make pap er entries A sophistication some parsing knowledge where literature on algorithms of state explicit is in p oint a of the to 1 stack the the provided that allows to sub derivations and eciency show that original indicate some the is increased the parse parse on B The there which However provided simulta connec A a various states stack table main table algo is For CP We the in B 2 a

notation of rules A A with the same which may lead to work b eing rep eated during parsing

1 2

lhs is often simplied to A j j Furthermore the parse forest a compact representa

1 2

A rule of the form A is called an epsilon rule tion of all parse trees which is output by a tabular

algorithm may in this case not b e optimally dense

We assume grammars do not have epsilon rules unless

stated otherwise

We conclude that we have a tradeo b etween the case

The relation P is extended to a relation on V V

that the grammar allows almost deterministic parsing

as usual The reexive and transitive closure of is

and the case that the stack algorithm is very nondeter

denoted by

ministic for a certain grammar In the former case so

We dene B A if and only if A B for some

phistication leads to less entries in the table and in the

The reexive and transitive closure of is denoted by

latter case sophistication leads to more entries pro

and is called the leftcorner relation

vided this sophistication is realised by an increase in

We say two rules A and B have a com

the number of states This is corrob orated by empirical

1 2

mon prex if and for some

data from which deal with tabular LR parsing

1 1 2 2 1

and where

2

As we will explain CP and ELR parsing are more

A recognition algorithm can b e sp ecied by means

deterministic than most other parsing algorithms for

of a pushdown automaton A T Alph Init Fin

many grammars but their tabular realizations can

which manipulates congurations of the form v

never compute the same sub derivation twice This rep

where Alph is the stack constructed from left

resents an optimum in a range of p ossible parsing algo

to right and v T is the remaining input

rithms

The initial conguration is Init w where Init

This pap er is organized as follows First we discuss

Alph is a distinguished stack symbol and w is the input

nondeterministic leftcorner parsing and demonstrate

The steps of an automaton are sp ecied by means of the

how common prexes in a grammar may b e a source of

relation Thus v v denotes that v

bad p erformance for this technique

is obtainable from v by one step of the automaton

Then a multitude of parsing techniques which ex

The reexive and transitive closure of is denoted by

hibit b etter treatment of common prexes is dis

The input w is accepted if Init w Fin

cussed These techniques including nondeterministic

where Fin Alph is a distinguished stack symbol

PLR ELR and CP parsing have their origins in theory

of deterministic parallel and tabular parsing Subse

LC parsing

quently the application to parallel and tabular parsing

is investigated more closely

For the denition of leftcorner LC recognition we

need stack symbols items of the form A

Further we briey describ e how rules with empty

where A is a rule and Remember that

righthand sides complicate the parsing pro cess

we do not allow epsilon rules The informal meaning

The ideas describ ed in this pap er can b e generalized

of an item is The part b efore the dot has just b een

to headdriven parsing as argued in

recognized the rst symbol after the dot is to b e rec

We will take some lib erty in describing algorithms

ognized next For technical reasons we also need the

from the existing literature since using the original de

items S S and S S where S is a fresh

scriptions would blur the similarities of the algorithms

symbol Formally

to one another In particular we will not treat the use

of lo okahead and we will consider all algorithms work

LC y

I fA j A P A S g

ing on a stack to b e nondeterministic We will only

y

describ e recognition algorithms Each of the algorithms

where P represents the augmented set of rules consist

can however b e easily extended to yield parse trees as

ing of the rules in P plus the extra rule S S

a sideeect of recognition

Algorithm Leftcorner

LC LC

A T I Init Fin Init S S Fin

The notation used in the sequel is for the most part

S S Transitions are allowed according to the

standard and is summarised b elow

following clauses

A contextfree grammar G T N P S consists of

two nite disjoint sets N and T of nonterminals and

B C av

terminals resp ectively a start symbol S N and a

B C A a v

y

nite set of rules P Every rule has the form A

where there is A a P such that A C

where the lefthand side lhs A is an element from N

A a av A a v

and the righthand side rhs is an element from V

B C A v

where V denotes N T P can also b e seen as a

B C D A v

relation on N V

y

where there is D A P such that D C

We use symbols A B C to range over N symbols

B A A v B A v

a b c to range over T symbols X Y Z to range over

V symbols to range over V and v w x in the The conditions using the leftcorner relation

to range over T We let denote the empty string The rst and third clauses together form a feature which is

called topdown TD ltering TD ltering makes sure for some grammar G with LC parsing for some gram

that sub derivations that are b eing computed b ottom mar G which results after applying a transformation

up may eventually grow into sub derivations with the re called leftfactoring

quired ro ot TD ltering is not necessary for a correct Leftfactoring consists of replacing two or more rules

algorithm but it reduces nondeterminism and guar A j j with a common prex by the rules

1 2

antees the correctprex property which means that in A A and A j j where A is a fresh non

1 2

case of incorrect input the parser do es not read past the terminal The eect on LC parsing is that a choice

rst incorrect character b etween rules is p ostp oned until after all symbols of

are completely recognized Investigation of the next k

Example Consider the grammar with the following

symbols of the remaining input may then allow a choice

rules

b etween the rules to b e made deterministically

The PLR algorithm is formalised in by trans

E E T j T E j T

forming a PLRk grammar into an LLk grammar

T T F j T F j F

and then assuming the standard realisation of LLk

F a

parsing When we consider nondeterministic topdown

parsing instead of LLk parsing then we obtain the

It is easy to see that E E T E T T F T

new formulation of nondeterministic PLR parsing

contains The relation but from the reexive closure

b elow

F and from the transitive closure it also contains F

We rst need to dene another kind of item viz of

it also contains F E

the form A such that there is at least one rule of

The recognition of a a is realised by

the form A for some Formally

E E a a

PLR y

I fA j A P A S g

E E F a a

E E T F a

PLR

Informally an item A I represents one or

E E T T F a

LC

more items A I

E E T T F a

Algorithm Predictive LR E E T T F F a

PLR PLR

A T I Init Fin Init S Fin E E T T F

S S and dened by

E E E T

E E

B av B A a v

y

where there are A a B C P such that

Note that since the automaton do es not use any lo oka

A C

head Step may also have replaced T F by

any other item b esides T T F whose rhs starts

A av A a v

y

with T and whose lhs satises the condition of top

where there is A a P

down ltering with regard to E ie by T T F

B A v B D A v

y

E T E or E T 2

where A P and where there are D

y

C A B C P such that D

LC parsing with k symbols of lo okahead can handle

B A v B A v

deterministically the so called LCk grammars This

1 y

class of grammars is formalized in How LC pars where A P and where there is B A

y

P

ing can b e improved to handle common suxes e

ciently is discussed in in this pap er we restrict our

Example Consider the grammar from Example

attention to common prexes

Using Predictive LR recognition of a a is realised by

E a a

PLR ELR and CP parsing

E F a a

In this section we investigate a number of algorithms

E T F a

which exhibit a b etter treatment of common prexes

E T T a

E T T a

Predictive LR parsing

Predictive LR PLR parsing with k symbols of lo oka

E E

head was introduced in as an algorithm which yields

ecient parsers for a subset of the LRk grammars

Comparing these congurations with those reached by

and a sup erset of the LCk grammars How determin

the LC recognizer we see that here after Step the

istic PLR parsing succeeds in handling a larger class

stack element T T represents b oth T T F

of grammars the PLRk grammars than the LCk

and T T F so that nondeterminism is reduced

grammars can b e explained by identifying PLR parsing

Still some nondeterminism remains since Step could

1

also have replaced T F by E T which repre

In a dierent denition of the LCk grammars may

sents b oth E T E and E T 2 b e found which is not completely equivalent

LC

Extended LR parsing in I and one for nonkernel items the others These

are dened by

An extended contextfree grammar has righthand sides

consisting of arbitrary regular expressions over V This

goto Q X

1

requires an LR parser for an extended grammar an

closure fA X j A X Q

ELR parser to b ehave dierently from normal LR

A S g

parsers

goto Q X

2

The b ehaviour of a normal LR parser up on a reduc

closure fA X j A X Q A S g

tion with some rule A is very simple it p ops jj

states from the stack revealing say state Q it then

At each shift where X is some terminal and each re

pushes state goto Q A We identify a state with its

duce with some rule A where X is A we may non

corresp onding set of items

deterministically apply goto which corresp onds with

1

For extended grammars the b ehaviour up on a reduc

case a or goto which corresp onds with case b Of

2

tion cannot b e realised in this way since the regular

course one or b oth may not b e dened on Q and X

expression of which the rhs is comp osed may describ e

b ecause goto Q X may b e for i f g

i

strings of various lengths so that it is unknown how

Now remark that when using goto and goto each

1 2

many states need to b e p opp ed

reachable set of items contains only items of the form

In this problem is solved by forcing the parser to

A for some xed string plus some nonkernel

decide at each call goto Q X whether

items We will ignore the nonkernel items since they

can b e derived from the kernel items by means of the

a X is one more symbol of an item in Q of which some

closure function

symbols have already b een recognized or whether

This suggests representing each set of items by a new

b X is the rst symbol of an item which has b een

kind of item of the form fA A A g which

1 2 n

introduced in Q by means of the closure function

represents all items A for some and A

In the second case a state which is a variant of

fA A A g Formally

1 2 n

goto Q X is pushed on top of state Q as usual In

ELR

y

I f j fA j A P g

the rst case however state Q on top of the stack is

fS gg

replaced by a variant of goto Q X This is safe since

where we use the symbol to range over sets of non

we will never need to return to Q if after some more

terminals

steps we succeed in recognizing some rule corresp ond

ing with one of the items in Q A consequence of the

Algorithm Extended LR

ELR ELR

action in the rst case is that up on reduction we need

A T I Init Fin Init fS g Fin

to p op only one state o the stack

fS g S and dened by

Further work in this area is rep orted in which

av a v

treats nondeterministic ELR parsing and therefore do es

y

where fA j A a B C P B

not regard it as an obstacle if a choice b etween cases a

A C g is nonempty

and b cannot b e uniquely made

av a v

We are not concerned with extended contextfree

y

where fA j A a P g is nonempty

grammars in this pap er However a very interesting

v A v

algorithm results from ELR parsing if we restrict its ap

y

where there is A P with A and

plication to ordinary contextfree grammars We will

y

fD j D A B C P B D C g is

maintain the name extended LR to stress the origin

nonempty

of the algorithm This results in the new nondetermin

v A v

istic ELR algorithm that we describ e b elow derived

y

where there is A P with A and

from the formulation of ELR parsing in

y

fB j B A P g is nonempty

First we dene a set of items as

Note that Clauses and corresp ond with goto and

y

2

I fA j A P g

that Clauses and corresp ond with goto

1

LC

Note that I I If we dene for each Q I

Example Consider again the grammar from Exam

ple Using the ELR algorithm recognition of a a is

closure Q

realised by

Q fA j B C Q A C g

fE g a a

fE g fF g a a

then the goto function for LR parsing is dened by

fE g fT g F a

goto Q X

fE g fT E g T a

fE g fT g T a

closure fA X j A X Qg

For ELR parsing however we need two goto func

tions goto and goto one for kernel items ie those

fE g E

1 2

a a a

Comparing these congurations with those reached by

a a a

the PLR recognizer we see that here after Step the

F a a

stack element fT E g T represents b oth T T

T a a

F and T T F but also E T and

E a a

E T E so that nondeterminism is even further

E a a

reduced 2

E a a

A simplied ELR algorithm which we call the pseudo

E F a

ELR algorithm results from avoiding reference to in

E T a

Clauses and In Clause we then have a simplied

E T a

denition of viz fA j A a B C

E T a

y

P A C g and in the same way we have in Clause

We see that in Step the rst incorrect symbol is read

the new denition fD j D A B C

but recognition then continues Eventually the recog

y

P D C g Pseudo ELR parsing can b e more easily

nition pro cess is blo cked in some unsuccessful congu

realised than full ELR parsing but the correctprex

ration which is guaranteed to happ en for any incorrect

prop erty can no longer b e guaranteed Pseudo ELR

4

input In general however after reading the rst incor

parsing is the foundation of a tabular algorithm in

rect symbol the algorithm may p erform an unbounded

number of steps b efore it halts Imagine what happ ens

Commonprex parsing

for input of the form a a a a a a 2

One of the more complicated asp ects of the ELR algo

rithm is the treatment of the sets of nonterminals in

Tabular parsing

the lefthand sides of items A drastically simplied

Nondeterministic pushdown automata can b e realised

algorithm is the basis of a tabular algorithm in

eciently using parse tables A parse table consists

Since in the algorithm itself is not describ ed but

of sets T of items for i j n where a a

ij 1 n

2

only its tabular realisation we take the lib erty of giv

represents the input The idea is that an item is only

ing this algorithm our own name commonprex CP

stored in a set T if the item represents recognition of

ij

parsing since it treats all rules with a common prex

the part of the input a a

i+1 j

3

simultaneously

We will rst discuss a tabular form of CP parsing

The simplication consists of omitting the sets of

since this is the most simple parsing technique discussed

nonterminals in the lefthand sides of items

ab ove We will then move on to the more dicult ELR

CP y

technique Tabular PLR parsing is fairly straightfor

I f j A P g

ward and will not b e discussed in this pap er

Algorithm Commonprex

CP CP

Tabular CP parsing

A T I Init Fin Init Fin S

and dened by

CP parsing has the following tabular realization

Algorithm Tabular commonprex

av a v

CP

y

Sets T of the table are to b e subsets of I Start

where there are A a B C P such that

ij

with an empty table Add to T Perform one of

A C

00

the following steps until no more items can b e added

av a v

y

Add a to T for a a and T

where there is A a P

i1i i ji1

y

where there are A a B C P such that

v A v

y

A C

where there are A D A B C P

such that D Add a to T for a a and T C

ji i ji1

y

where there is A a P

v A v

y

where there are A B A P

Add A to T for T and T

ji ji hj

y

where there are A D A B C P

The simplication which leads to the CP algorithm

such that D C

inevitably causes the correctprex prop erty to b e lost

Add A to T for T and T

hi ji hj

Example Consider again the grammar from Exam

y

where there are A B A P

ple It is clear that a a a is not a correct string

Rep ort recognition of the input if S T

0n

according to this grammar The CP algorithm may go

through the following sequence of congurations

For an example see Figure

Tabular CP parsing is related to a variant of CYK

2

An attempt has b een made in but this pap er do es

parsing with TD ltering in A form of tabular

not describ e the algorithm in its full generality

3 4

The original algorithm in applies an optimization unless the grammar is cyclic in which case the parser

concerning unit rules irrelevant to our discussion may not terminate b oth on correct and on incorrect input

(0) (1) (5)

a E E T

(2)

F E

Consider again the grammar from

(3)

T

Example and the incorrect in

(4)

E

put a a a After execution

of the tabular commonprex al

gorithm the table is as given here

The sets T are given at the j th

(6) (9)

a T T E

ji

row and ith column

(7)

F

The items which corresp ond with

(8)

T

those from Example are lab elled

with These lab els also

indicate the order in which these

(10)

a

items are added to the table

F

T

E

Figure Tabular CP parsing

CP parsing without topdown ltering ie without the However for certain i there may b e many

checks concerning the leftcorner relation T for some j and each may give rise to a dierent is the

ji1

which is nonempty In this way Clause may add main algorithm in

several items a to T some p ossibly with Without the use of topdown ltering the references

i1i

overlapping sets Since items represent computation to in Clauses and are clearly not of much use

of sub derivations the algorithm may therefore compute any more When we also remove the use of these items

the same sub derivation several times then these clauses b ecome

We prop ose an optimization which makes use of the

Add a to T for a a

i1i i

fact that all p ossible items T are already

y

ji1

where there is A a P

present when we compute items in T we compute

i1i

Add A to T for T

ji ji

one single item a where is a large set com

y

where there are A D A P

puted using all T for any j A similar

ji1

optimization can b e made for the third clause

In the resulting algorithm no set T dep ends on any

ij

set T with g i In this fact is used to construct

g h

Algorithm Tabular extended LR

a parallel parser with n pro cessors P P with

ELR

0 n1

Sets T of the table are to b e subsets of I Start

ij

each P pro cessing the sets T for all j i The ow

i ij

with an empty table Add fS g to T For

00

of data is strictly from right to left ie items computed

i n in this order p erform one of the following

by P are only passed on to P P

i 0 i1

steps until no more items can b e added

Add a to T for a a

Tabular ELR parsing

i1i i

where fA j j T A a B

ji1

The tabular form of ELR parsing allows an optimiza

y

C P B A C g is nonempty

tion which constitutes an interesting example of how a

tabular algorithm can have a prop erty not shared by its

Add a to T for a a and

ji i

5

nondeterministic origin

T

ji1

y

First note that we can compute the columns of a

where fA j A a P g is nonempty

parse table strictly from left to right that is for xed i

Add A to T for T

ji ji

we can compute all sets T b efore we compute the sets

ji

y

where there is A P with A and

T

ji+1

fD j h T D A B C

hj

If we formulate a tabular ELR algorithm in a naive

y

P B D C g is nonempty

way analogously to Algorithm as is done in then

Add A to T for T and

for example the rst clause is given by

hi ji

T

hj

Add a to T for a a and

i1i i

y

where there is A P with A and

T

ji1

y

fB j B A P g is nonempty

y

where fA j A a B C P B

A Rep ort recognition of the input if fS g S T C g is nonempty

0n

5

Informally the topdown ltering in the rst and

This is reminiscent of the admissibility tests which

third clauses is realised by investigating all left corners

are applicable to tabular realisations of logical pushdown

D of nonterminals C ie D C which are exp ected automata but not to these automata themselves

from a certain input p osition For input p osition i these We conclude that we have a family consisting of LC

nonterminals D are given by PLR ELR and LR parsing which are increasingly de

terministic In general the more deterministic an algo

S fD j j T

i ji

rithm is the more parser states it requires For exam

y

B C P B D C g

ple the LC algorithm requires a number of states the

LC

items in I which is linear in the size of the gram

Provided each set S is computed just after comple

i

mar By contrast the LR algorithm requires a number

tion of the ith column of the table the rst and third

of states the sets of items which is exponential in the

clauses can b e simplied to

size of the grammar

Add a to T for a a

i1i i

The dierences in the number of states complicates

y

where fA j A a P g S is nonempty

i1

the choice of a tabular algorithm as the one giving op

timal b ehaviour for all grammars If a grammar is very

Add A to T for T

ji ji

y

simple then a sophisticated algorithm such as LR may

where there is A P with A and

y

allow completely deterministic parsing which requires a

fD j D A P g S is nonempty

j

linear number of entries to b e added to the parse table

which may lead to more practical implementations

measured in the size of the grammar

Note that we may have that the tabular ELR algo

If on the other hand the grammar is very ambigu

rithm manipulates items of the form which

ous such that even LR parsing is very nondeterministic

would not o ccur in any search path of the nondeter

then the tabular realisation may at worst add each state

ministic ELR algorithm b ecause in general such a

to each set T so that the more states there are the

ij

is the union of many sets of items which

more work the parser needs to do This favours sim

would b e manipulated at the same input p osition by the

ple algorithms such as LC over more sophisticated ones

nondeterministic algorithm in dierent search paths

such as LR Furthermore if more than one state repre

With minor dierences the ab ove tabular ELR algo

sents the same sub derivation then computation of that

rithm is describ ed in A tabular version of pseudo

sub derivation may b e done more than once which leads

ELR parsing is presented in Some useful data

to parse forests compact representations of collections

structures for practical implementation of tabular and

of parse trees which are not optimally dense

nontabular PLR ELR and CP parsing are describ ed

Schabes prop oses to tune a parser to a grammar or

in

in other words to use a combination of parsing tech

niques in order to nd an optimal parser for a certain

Finding an optimal tabular algorithm

7

grammar This idea has until now not b een realised

In Schabes derives the LC algorithm from LR pars

However when we try to nd a single parsing algorithm

ing similar to the way that ELR parsing can b e derived

which p erforms well for all grammars then the tabu

from LR parsing The LC algorithm is obtained by not

lar ELR algorithm we have presented may b e a serious

only splitting up the goto function into goto and goto

1 2

candidate for the following reasons

but also splitting up goto even further so that it non

2

deterministically yields the closure of one single kernel

For all i j and at most one item of the form

item This idea was describ ed earlier in and more

is added to T Therefore identical sub

ij

recently in

derivations are not computed more than once This

Schabes then argues that the LC algorithm can b e

is a consequence of our optimization in Algorithm

determinized ie made more deterministic by manip

Note that this also holds for the tabular CP algo

ulating the goto functions One application of this idea

rithm

is to take a xed grammar and choose dierent goto

ELR parsing guarantees the correctprex prop erty

functions for dierent parts of the grammar in order

contrary to the CP algorithm This prevents com

to tune the parser to the grammar

putation of all sub derivations which are useless with

In this section we discuss a dierent application of

regard to the already pro cessed input

this idea we consider various goto functions which are

global ie which are the same for all parts of a grammar

ELR parsing is more deterministic than LC and PLR

One example is ELR parsing as its goto function can

2

parsing b ecause it allows shared pro cessing of all

b e seen as a determinized version of the goto function

2

common prexes It is hard to imagine a practical

of LC parsing In a similar way we obtain PLR parsing

parsing technique more deterministic than ELR pars

Traditional LR parsing is obtained by taking the full

ing which also satises the previous two prop erties

determinization ie by taking the normal goto function

In particular we argue in that renement of the

6

which is not split up

LR technique in such a way that the rst prop erty

ab ove holds whould require an impractically large 6

Schabes more or less also argues that LC itself can b e

number of LR states

obtained by determinizing TD parsing In lieu of TD pars

ing he mentions Earleys algorithm which is its tabular

7

realisation This is reminiscent of the idea of optimal cover

MJ Nederhof Generalized leftcorner parsing In Epsilon rules

Sixth Conference of the European Chapter of the

Epsilon rules cause two problems for b ottomup pars

ACL

ing The rst is nontermination for simple realisations

of nondeterminism such as backtrack parsing caused

MJ Nederhof A multidisciplinary approach to

by hidden left recursion The second problem o ccurs

a parsing algorithm In K Sikkel and A Ni

when we optimize TD ltering eg using the sets S it

jholt editors Natural Language Parsing Methods

i

is no longer p ossible to completely construct a set S b e

and Formalisms Pro c of the sixth Twente Work

i

fore it is used b ecause the computation of a derivation

shop on Language Technology University

deriving the empty string requires S for TD ltering

of Twente

i

but at the same time its result causes new elements to

MJ Nederhof and G Satta An extended theory

b e added to S Both problems can b e overcome

i

of headdriven parsing In this pro ceedings

P Oude Luttighuis and K Sikkel Generalized LR

Conclusions

parsing and attribute evaluation In Third Inter

We have discussed a range of dierent parsing algo

national Workshop on Parsing Technologies

rithms which have their ro ots in construction

Tilburg The Netherlands and Durbuy Bel

expression parsing and natural language pro cessing

gium August

We have shown that these algorithms can b e describ ed

PW Purdom Jr and CA Brown Parsing

in a common framework

extended LRk grammars Acta Informatica

We further discussed tabular realisations of these al

gorithms and concluded that we have found an opti

mal algorithm which in most cases leads to parse tables

J Rekers Parser Generation for Interactive Envi

containing fewer entries than for other algorithms but

ronments PhD thesis University of Amsterdam

which avoids computing identical sub derivations more

than once

DJ Rosenkrantz and PM Lewis I I Deterministic

left corner parsing In IEEE Conference Record

Acknowledgements

of the th Annual Symposium on Switching and

Automata Theory

The author acknowledges valuable corresp ondence with

Klaas Sikkel Rene Leermakers Francois Barthelemy

Y Schabes Polynomial time and space shift

Giorgio Satta Yves Schabes and FredericVoisin

reduce parsing of arbitrary contextfree grammars

In th Annual Meeting of the ACL

References

K Sikkel and M Lankhorst A parallel b ottom

S Billot and B Lang The structure of shared

up Tomita parser In Konferenz Verarbeitung

forests in ambiguous parsing In th Annual Meet

Nat urlicherSprache N urnberg Octob er

ing of the ACL

SpringerVerlag

M Johnson The computational complexity of

S Sippu and E SoisalonSoininen Parsing The

GLR parsing In M Tomita editor Generalized

ory Vol II LRk and LLk Parsing EATCS

LR Parsing chapter Kluwer Academic

Monographs on Theoretical Computer Science

Publishers

volume SpringerVerlag

B Lang Complete evaluation of Horn clauses

E SoisalonSoininen and E Ukkonen A metho d

An automata theoretic approach Rapp ort de

for transforming grammars into LLk form Acta

Recherche Institut National de Recherche en

Informatica

Informatique et en Automatique Ro cquencourt

M Tomita Ecient Parsing for Natural Lan

France November

guage Kluwer Academic Publishers

M Lankhorst An empirical comparison of gener

F Voisin CIGALE A to ol for interactive grammar

alized LR tables In R Heemels A Nijholt and

construction and expression parsing Science of

K Sikkel editors Tomitas Algorithm Extensions

Computer Programming

and Applications Pro c of the rst Twente Work

F Voisin A b ottomup adaptation of Earleys

shop on Language Technology University of

parsing algorithm In Programming Languages

Twente September Memoranda Informatica

Implementation and Logic Programming Interna

tional Workshop LNCS Orleans

R Leermakers How to cover a grammar In th

France May SpringerVerlag

Annual Meeting of the ACL

F Voisin and JC Raoult A new b ottomup

R Leermakers A recursive ascent Earley

general parsing algorithm BIGRE

parser Information Processing Letters

September February