, ,

Computer Science/Department Science/Department Computer

Centrum Centrum

Centre Centre

of of

Software Software

for for

Mathematics Mathematics

voor voor

Technology Technology

Wiskunde Wiskunde

Incremental Incremental

Centrum Centrum

J. J.

and and

Heering, Heering,

Report Report

ypor ypor

Computer Computer

en en

generation generation

Wisl~unde Wisl~unde

Biblk>tlleek Biblk>tlleek

Am~tel>dam Am~tel>dam

P. P.

CS-R8822 CS-R8822

lnformatica lnformatica

Klint, Klint,

en en

J.G. J.G.

Science Science

of of

lnformatk:a lnformatk:a

parsers parsers

Rekers Rekers May May The Centre for Mathematics and Computer Science is a research institute of the Stichting Mathematisch Centrum, which was founded on February 11, 1946, as a nonprofit institution aim­ ing at the promotion of mathematics, computer science, and their applications. It is sponsored by the Dutch Government through the Netherlands Organization for the Advancement of Pure Research (Z.W.0.).

q\ '

Copyright (t::Stichting Mathematisch Centrum, Amsterdam 1

Incremental Generation of Parsers

J. Heering Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AS Amsterdam, The Netherlands

P. Klint Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AS Amsterdam, The Netherlands and Programming Research Group, University of Amsterdam P.O. BOX 41882, 1009 DB Amsterdam, The Netherlands

J. Rekers Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam, The Netherlands

A parser checks whether a text is a sentence in a language. Therefore, the parser is provided with the grammar of the language, and it usually generates a structure () that represents the text according to that grammar. Most present-day parsers are not directly driven by the grammar but by a 'parse table', which is generated by a parse table generator. A table based parser wolks more efficiently than a grammar based parser does, and provided that the parser is used often enough, the cost of gen­ erating the parse table is outweighed by the gain in efficiency. However, we need a parsing system for an environment where language definitions are developed

~(and modified) interactively; here we do not have enough time to entirely generate the parse table be­ fore parsing starts, and because of the frequent modifications of the grammar, it is not obvious that the extra efficiency of a table based parser makes up for the cost of generating the parse table. We propose a lazy and incremental parser generation technique: (1) The parse table is generated during parsing, but only those parts of it are generated that are really needed to parse the input sen­ tences at hand. (2) If the grammar is modified, the existing parse table is modified incrementally, rather than regenerated completely. Using this technique we have eliminated the disadvantages of a parse table generator, while we retained the advantages of a table based parser. We use a parsing algorithm based on parallel LA parsing which is capable of handling arbitrary context-free grammars. We present all required algorithms and give measurements comparing their performance with that of conventional techniques.

Keywords & Phrases: lazy and incremental parser generation, program generation, incomplete program, LA parser generator, parallel LA parsing, parsing of context-free languages. 1987 CR Categories: 0.1.2, 0.3.1, 0.3.4, F.4.2. 1985 Mathematics Subject Classification: G8N20. Note: Partial support received from the European Communities under ESPRIT project 348 (Generation of Interactive Programming Environments - GIPE).

Report C$>-R8822 Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam,The Netherlands

efficiency efficiency

tion tion

ing ing

environment environment

parser parser

tor tor

extend extend

For For

scanner scanner

algorithm. algorithm.

dynamic dynamic

We We

The The

many many

generate generate

1. 1.

2 2

• •

• •

• •

• •

• •

• •

INTRODUCTION INTRODUCTION

[Log88] [Log88]

The The

The The

The The

is is

The The

interactive interactive

the the

already already

small. small.

phase, phase,

sentences sentences

re-used. re-used.

editing editing out out

class class

the the

a a

module module There There

module module

-

(LITHE (LITHE

-

- sufficiently sufficiently

grammar, grammar,

When When

component. component.

the the

describe describe

3 3

design design

small small

In In

no no

The The

It It

to to

that that

ing ing

In In

times. times.

are are

we we

it it

parser parser

parser. parser.

that that

general general

parsing parsing

parser parser

efficiency efficiency

parser parser

may may

the the

section section

generator generator

general general

once once

efficient efficient

of of

applications applications

delay delay

corresponding corresponding a

present present

measurements, measurements,

generated generated

is is

a a

of of

but but

time time

We We

part part

of of

generated generated

context-free context-free

in in

of of

with with

defines defines

language language

parser; parser;

of of

[San82], [San82],

a a

Hence, Hence,

for for

the the

happen happen

a a

If If

do do

have have

a a

these these

the the

again again

comparison comparison

language language

this this

generator generator

trend trend

lazy lazy

parser parser

time time is is

often. often.

extend extend

principles principles

needed needed

algorithm algorithm

of of

this this

(completely) (completely)

the the

2 2

parser parser

only only

The The

not not

the the

parsers. parsers.

generated generated

parser parser

the the

of of

we we

the the

environment environment

been been

ISG ISG

the the

languages, languages,

ASF/SDF ASF/SDF

is is

its its

due due

and and

on on

into into

the the

towards towards

use use

the the

that that

(visible) (visible)

OBJ OBJ

sketched sketched

parser. parser.

is is

response response

parsing parsing

a a

discuss discuss

grammar, grammar,

Three Three

generators generators

indeed indeed

this this

effort effort

to to

the the

generator. generator.

own own

small small

definition definition

effort effort

being being

is is

generated, generated,

grammars, grammars,

parsing parsing

to to

incremental incremental

is is

is is

the the

an an

some some

and and

parse parse

is is

underlying underlying

modification modification

There There

[FGJM85], [FGJM85],

with with

incremental. incremental.

generated generated

fly fly

described. described.

algorithm algorithm

the the

incremental incremental

capable capable

in in

whole whole

observations observations

syntax, syntax,

programming/specification programming/specification

spent spent

section section

new new

Parts Parts

part part

the the

related related

algorithm algorithm

designed, designed,

from from

put put

specification specification

syntax syntax

above: above:

parser parser

a a

parts parts

time time

it it

conventional conventional

is is

the the

process process

a a

situations situations are

lazy lazy

case, case,

is is

is is

parametrized parametrized

systems. systems.

of of

in in

smoother smoother

parser parser

on on

the the

grammar, grammar,

of of

and and

input input

the the

of of usually usually

of of

of of

of of

generating generating

the the

parser parser and and

contains contains 8

by by

into into

algorithms algorithms

our our

generation generation

the the

of of

fashion fashion

such such

Cigale Cigale

parser parser

The The

a a

handling handling

great great

SDF SDF

in in

the the

the the

that that

parser parser

used used

its its

need need

itself itself

A A

grammar grammar

sophisticated, sophisticated,

the the

is is

method, method,

must must

parser parser

each each

can can

the the

a a

change change

grammar grammar

parts parts by

grammar grammar

language language

not not

editor editor

lazy lazy

combination combination

response response

generator generator

can can

based based

definition. definition.

importance importance

imported imported

techniques, techniques,

parser, parser,

work work

will will

[Voi86], [Voi86],

by by

be be

while while

remains remains

from from

however, however,

generation generation

be be

only only

import import

with with

some some

them them

incrementally incrementally

that that

arbitrary arbitrary

parser parser

and and

made made

phase phase

us. us.

be be

see see

is is

generated, generated,

is is

in in

on on

is is

parsing parsing

the the

the the

determined determined

modified. modified.

are are

a a

rather rather

are are

show show

is is

acceptable, acceptable,

as as

Section Section

mentioned mentioned

the the

is is

is is

saved saved

concluding concluding

[HKR87b]. [HKR87b].

IPG, IPG,

the the

syntax syntax

unaffected, unaffected,

ASF/SDF ASF/SDF

here: here:

possibly possibly

of of

module. module.

and and

not not

parser parser

generation generation

efficient efficient

achieved achieved

not not

grammar. grammar.

by by unaffected

where where

not not

languages languages

to to

grammar grammar

the the

ISG/IPG ISG/IPG

context-free context-free

algorithm. algorithm.

a a

assumption assumption

how how

yet yet

than than

parsing parsing

use use

which which

input. input.

thrown thrown

on on

needed needed

module module

overllead overllead

but but

4 4

written written

generator generator

incorporate incorporate

completely completely

this this

One One

describes describes the the

our our

a a

inefficient, inefficient,

generating generating

by by

above. above.

as as

For For

even even

[Hen87,HK88]). [Hen87,HK88]).

remarks. remarks.

there there

than than

parser parser

in in

In In

is is

produces produces

algorithm algorithm

When When

is is

generation generation

conventionally conventionally a

There There

assumption assumption

the the

can can

that that

away. away.

technique technique

would would

by by

the the

extends extends

in in

efficient efficient

specially specially

[HKR87a] [HKR87a]

Finally, Finally,

used used

in in

grammars. grammars.

the the

that that

though though

introduced introduced

efficiency efficiency

is is

any any

SDF, SDF,

start start

The The

is is

allow allow

sense sense

the the

generator generator

typical typical

a a

no no

is is

modification modification

modifications modifications

wasted. wasted.

fixed fixed

This This

to to like

an an

parser parser

the the

conventional conventional

in in

of of

a a

conventional conventional

no no

immediately. immediately.

universal universal

guarantee guarantee

in in

and and

the the

section section

entirely entirely

corresponding corresponding

evolved evolved

process process parsing parsing

that that

the the

an an

the the

general general

tailored tailored

does does

generated generated

section section

separate separate

clearly clearly

a a

input input

uses uses

of of

After After

syntax syntax

exploit exploit

interactive interactive

by by

generator generator

sentences sentences

lazy/incremental lazy/incremental

lexical lexical

once once

that that

In In

generated generated

not not

the the

this this

7 7

such such

new new

ISG/IPG ISG/IPG

syntax-directed syntax-directed

sentences sentences

as as

and and

user-defined user-defined

from from

in in

towards towards

that that

5. 5.

gives gives

each each

has has

can can

parser, parser,

apply: apply: all all

of of

parser parser

parser parser

a a

lazy lazy

of of

this this

parsers parsers

case, case,

the the

scanner scanner

In In

When When

syntax-directed syntax-directed

whole. whole.

one. one.

needed needed

the the

languages languages

advantages advantages

it it

handle handle

can can

actually actually them. them.

the the

change change

change change

the the

development development

section section

grammar grammar

fact fact

will will

parser. parser.

technique technique

grammar grammar

but but

since since

as as

generation generation

generation generation

the the

be be

need need

importing importing

results results

the the

are are

by by

its its

and and

parts parts

be be

It It

also also

In In

used used

a a

lexical lexical

syntax syntax

highly highly

in in

of of

given given

pars­

input input

6 6

there there

mak­

large large

turns turns

each each

only only

used used

used used

sec­

edi­

the the

are are

the the

we we

for for

the the

by by

of of

of of

in in

is is to to 3

2. THE CHOICE OF A PARSING ALGORITHM

2.1. A comparison of algorithms We compare some existing parsing algorithms with our own algorithm from the perspective of highly dynamic applications like the ones discussed in the previous section: • LR(k) and LALR(k) algorithms [ASU86, eh. 4.7] These algorithms are controlled by a parse table that is constructed beforehand by a table genera­ tor. The table is constructed top-down, while the parser itself wotks bottom-up. The parser wotks in linear time. When the look-ahead k is increased, the class of recognizable languages becomes larger (but will always be limited to non-ambiguous grammars), and the table generation time increases exponentially. With conventional LR or LALR table generation algorithms it is impossi­ ble to update an already generated parse table incrementally, if the grammar is modified. • Recursive descent and LL(k) algorithms [ASU86, eh. 4.4] A generator constructs a parsing program, whereas an LL generator con­ structs a parse table that is interpreted by a fixed parser. In both algorithms the parsers work top­ down. The class of accepted languages depends on the look-ahead k, but is always limited to non-left-recursive, non-ambiguous grammars. • Earley's general context-free parsing algorithm [Ear70] Earley's algorithm can handle all context-free grammars. It wotks by attaching to each symbol in the input a set of 'dotted rules'. A dotted rule consists of a syntax rule with a cursor (dot) in it and the position in the input where the recognition of the rule started. The set of dotted rules for sym­ bol n+l is computed at parse time from the set for symbol n. Earley's algorithm does not have a separate generation phase, so it adapts easily to modifications in the grammar. It is this same lack of a generation phase that makes the algorithm too inefficient for interactive purposes. • Cigale [Voi86] Cigale uses a parsing algorithm that is specially tailored to expression parsing. It builds a trie for the grammar in which production rules with the same prefix share a path. During parsing this trie is recursively traversed. A trie can easily be extended with new syntax rules and tries for different grammars can be combined just like modules. The class of grammars is only somewhat larger , than LR(O) grammars, because the parser does not use look-ahead in a general manner and cannot backtrack. • OBJ [FGJM85] OBJ uses a recursive descent parsing technique with . OBJ itself does not allow ambiguous grammars, but the backtrack parser does detect all ambiguous parses. This makes the parsing system suitable for finitely ambiguous grammars, but as mentioned in [FGJM85, p. 60] ''parsing can be expensive for complex expressions'', which makes the algorithm less suitable for large input sentences. • Tomita's pseudo-parallel parsing algorithm [Tom85,Rek87,Lan74] This is an extended LR parsing algorithm, that requires a conventional (but possibly ambiguous) LR(O) or LR(l) parse table. The parser starts as an LR parser, but when it hits a conflict in the parse table, it splits up in several LR parsers that work in parallel. Grammars are restricted to the class of finitely ambiguous context-free grammars. As it uses the same table generation phase as conventional LR algorithms, modifying the grammar is an expensive operation with Tomita's algorithm. • Incremental parser generator IPG We developed this method on the basis of Tomita's parsing algorithm, but provided the algorithm with an incremental LR(O) parse table generator. Parsing starts with an empty parse table, which is expanded by need during parsing. A change in the grammar is handled incrementally by remov­ ing the parts of the parse table that are affected by the change; these parts are recomputed for the piodifi.ed grammar when the parser needs them again. The parse table is constructed during pars­ ing, so after a certain time, depending on the input given to it, the system will become as fast as a conventionally controlled Tomita parser. Fig. 2.1 gives an overview of the following characteristics of these algorithms: capable of handling

explained explained

The The

same same efficient efficient

the the

table table

puted puted

dependent dependent

and and

parser parser

and and

The The algorithm. algorithm.

a a

information information

the the

adapts adapts

2.2. 2.2.

arbitrary arbitrary modifications modifications

4 4

specialised specialised

grammar. grammar.

Tomita's Tomita's

grammar. grammar.

latter latter the the

simplest simplest

Evolution Evolution

lacy lacy

driven driven

Fig. Fig.

In In

Another Another

in in

These These

generation generation

easily easily

table table

a a

the the

technique technique

context-free context-free

can can

in in

table table

2.3(b) 2.3(b)

separate separate

part, part,

These These

driven driven

must must

Fig. Fig.

section section

system system

general general

parser parser

parser parser

driven driven

scheme scheme

parsing parsing

grammar grammar

remove remove

to to

grammar grammar

This This

of of

frequently frequently

An An

of of

Tomita Tomita the the

IPG IPG

generator generator

Cigale Cigale

OBI OBI algorithm algorithm

Earley Earley

recursive recursive

LR(k), LR(k),

modifications modifications

2.2: 2.2:

the the

shows shows

phase. phase.

be be

parsers parsers

parser parser

parsing parsing

example example

technique technique

as as

table table

parse parse

5. 5.

is is

is is

parser parser

shown shown

(re (re

grammar grammar

grammar grammar

(a) (a)

for for

algorithm. algorithm.

parts parts

grammars grammars

generated generated

that that

controlled controlled

)computed )computed

LALR(k) LALR(k)

an an

Grammar Grammar

a a

generation generation

used used

are are

descent, descent,

table, table,

and and

(general) (general)

together together of of

algorithms algorithms

of of

incremental incremental

in in

Fig. Fig.

of of

requirement requirement

will will

more more

the the

driven driven

Fig. Fig.

(flexible). (flexible).

Fig. Fig.

the the

organisation organisation

in in

this this

and and

2.1: 2.1:

The The

for for

(powerful), (powerful),

generated generated

by by

be be

the the

same same

from from

LL(k) LL(k)

driven driven

2.2(c). 2.2(c).

2.3(a) 2.3(a)

efficient efficient

technique technique

form form

parser parser

phase. phase.

a a

a a

explained explained

Comparison Comparison

parsing parsing

a a

algorithms algorithms

grammar, grammar,

given given

grammar grammar

parse parse

the the

parse parse

table table

and and

a a

the the

parser, parser,

is is

This This

specialised specialised

is is

grammar. grammar.

parse parse

Examples Examples

powerful powerful

specialised specialised

because because

grammar. grammar.

efficient efficient

modular modular

algorithms algorithms

grammar grammar

given given

table table

table table

shown shown

driven driven

table table

is is

parser parser

++ ++

++ ++

in in

++ ++

+ +

independent independent

to to

but but

is is

Earley's Earley's

(b) (b)

section section

of of

table table

be be

the the

generation generation

that that in in

generator generator

various various

is is

in in

specialised specialised

the the

parser, parser,

on on

composition composition

presented presented

Fig. Fig.

of of

parser parser

lazy lazy

that that

An An

inefficient inefficient

Fig. Fig.

have have

is is

parsing parsing

large large

6. 6.

this this

algorithm algorithm

2.2(a), 2.2(a),

generated generated

example example

fast fast

++ ++

++ ++

have have

++ ++

++ ++

2.2(c), 2.2(c),

parsing parsing

parse parse

but but

part, part,

evolved evolved

technique technique

for for

for for

input input

phase phase

in in

parser, parser,

these these

information information

become become

the the

where where

modifiable modifiable

the the

because because

of of

table table

section section

where where

of of

algorithms. algorithms.

sentences sentences

input input

in in

grammar, grammar,

parsers parsers

is is

by by

into into

flexible flexible table table

are are

this this

its its

(c) (c)

made made

are are

++ ++

++ ++

generation generation

the the

need. need.

+ + + +

incorrect incorrect

+ +

the the

extended extended

3 3

the the

pure pure

table table

scheme scheme

for for

driven driven

LL LL

parser parser

and and

specialised specialised

should should

is is

grammars. grammars.

parser parser

scheme scheme

part part

(fast), (fast),

table table

each each

and and

table table

computed computed

and and

The The

form. form.

4 4

driven driven

grammar grammar

parser parser

all all

modular modular

due due

of of

is is

parser. parser.

is is

generator generator

the the

technique technique

LR LR

be be

with with

is is

driven driven

parse parse

parser parser

efficient efficient

++ ++

++ ++

controlled controlled

fall fall

the the

the the

+ +

of of

split split

to to

This This

possible possible

parser parser

parse parse

parser. parser.

parsing parsing

Fig. Fig.

in in

recursive recursive

a a

It It

parser. parser.

a a

only only

step step

The The

table table

modification modification

into into

consists consists

this this

uses uses

kind kind

2.2(b 2.2(b

table table

processing processing

that that

once once

parse parse

(modular). (modular).

a a

directly directly

all all

category. category.

algorithms algorithms

corrector. corrector.

the the

grammar grammar

of of

Here Here

tree tree

), ),

is is

descent descent

will will

parsing parsing

of of

where where parser parser

in in

same same

com­

table table

the the

the the

the the

be be

by by

in in of of

ACTION ACTION

3.1. 3.1.

LR-PARSE: LR-PARSE:

An An

action action

3. 3.

In In

ning ning

Output: Output:

field/, field/,

Description: Description:

Input: Input:

new(T) new(T)

Algorithm: Algorithm:

[Tom85, [Tom85, for for

'reduce' 'reduce'

LR LR

this this

stacks stacks

ACTION(state, ACTION(state,

action action

sentence sentence

of of

means means

but but

trees trees

action action

section section

can can

our our

We We

'LRparser' with with 'LRparser'

'accept' 'accept'

LR LR

Conventional Conventional

in in

In In

A A

the the

for for

that that

PARSING PARSING

parallel. parallel.

~parallel ~parallel

needs needs

only only

adapted adapted

to to

'true' 'true'

section section

parser parser

start start

action, action,

the the

Rek87]. Rek87].

and and

table table

we we

set set

parser, parser,

the the

'(reduce '(reduce

that that

create create

field field

can can

A A

means means

LR-P LR-P

algorithms algorithms

The The

is is

handle handle

use use

state state

we we

and and

later later

if if

parser parser

simple simple

determines, determines,

the the

a a

grammar grammar

is is

action action handle

the the

parser parser

sentence sentence

can can

the the

we we

We We

list list

LR LR

push(e, push(e,

do do

ARSE(start-state, ARSE(start-state,

an an

means means

and and

It It

Fig. Fig.

controlled controlled

parser parser

ALGORITHMS ALGORITHMS

start-state start-state

on: on:

version version

rule)' rule)'

symbol) symbol)

that that

LR LR

parser parser

a a

original original

of of

be be

basically basically

sets sets

therefore therefore

object object

discuss discuss

not not

parse parse

to to

lazy lazy

generator generator

grammar grammar

LR LR

single single

the the

parser parser

2.3: 2.3:

( (

terminals terminals

accessed accessed

:= :=

parsing parsing

driven driven

which which

1) 1)

perform. perform.

the the

keep keep

that that

of of

has has

is is

parser. parser.

S), S),

table table

means means

head head

new(LRparser) new(LRparser)

ACTION ACTION

uses uses

algorithm algorithm

requires requires

(a) (a)

on on

of of

grammatically grammatically

whole whole

at at

with with

field field

LR LR

pop(S), pop(S),

the the

advanced advanced

by by

a a

and and

sets sets

consists consists

parser parser

recall recall

the the

symbols symbols

type type

Lazy Lazy

follow follow

most most

the the

symbol symbol

simplified simplified

by by

parsing parsing

input input

terminated terminated

its its

that that

the the

stack stack

a a

An An

of of

basis basis

input input

GOTO GOTO

sentence sentence

a a

sentence): sentence):

O.f O.f

T, T,

returns returns

parser parser

parse parse

the the

one one

arbitrary arbitrary

uses uses

current current

dynamically dynamically

visible visible

finished finished

and and

unknown unknown

the the

we we

action action

of of

read read

tree tree

and and

as as

one one

on on

of of

of of

details details

has has

algorithm, algorithm,

For For

action action

parser parser

a a

need need

top(S). top(S).

the the

a a

table table

correct, correct,

the the

the the

table, table,

generation, generation,

the the

copy(O) copy(O)

version version

(dynamically (dynamically

so so

a a

step step

stack stack

by by

sentence. sentence.

been been

can can

lists lists

state state

set set

parse parse

far far

length. length.

input input

current current

some some

parse parse

the the

of of

to to

correctly, correctly,

has has

in in

of of

be be

which which

we we

of of

can can

ordinary ordinary

recognized, recognized,

varying varying

determine determine

and and

recognizing recognizing

end-marker end-marker

'false' 'false'

possible possible

stack stack

of of

as as

to to

recognized recognized

states. states.

input input

sentence sentence

either either

low-level low-level

stack, stack,

use use

never never

(2) (2)

state state

(b) (b)

given given

current current

make make

Start-state Start-state

Tomita's Tomita's

has has

varying) varying)

the the

of of

otherwise. otherwise.

......

To To

but but

incremental incremental

number number

modifiable, modifiable,

LR LR

a a

The The

but but

of of

become become

the the

actions actions

functions functions

what what

a a

for for

'shift', 'shift',

an an

keep keep

is is

the the

symbol. symbol.

and and

the the

parsing parsing

functions. functions.

$. $.

copy copy

a a

algorithm. algorithm.

only only

the the

state state

its its

ACTION ACTION

instance instance

syntax syntax

(pseudo-)parallel (pseudo-)parallel

is is

parallel parallel

number number

to to

parser parser

of of

the the

things things

current current

syntax syntax

rather rather

the the

a a

of of

'reduce', 'reduce',

do do

states. states.

them. them.

table table

on on

grammar grammar

sentence sentence

parser parser

first. first.

head(L), head(L),

error error

object object

parser parser

Basically, Basically,

state state

with with

rule rule

top top

and and

simple, simple,

in in

For For

driven driven

LR-parser LR-parser

than than

of of

and and

rule rule

LR-PARSE LR-PARSE

input input

of of

action action

and and

[ASU86, [ASU86,

(3) (3)

in in

the the

conventional conventional

generation. generation.

the the

objects objects

0. 0.

driven driven

'accept', 'accept',

the the

of of

a a

a a

which which

rule rule

tail(L), tail(L),

We We

recognized recognized

must must

single single

symbol. symbol.

current current

GOTO GOTO

we we

the the

···················+--• ···················+--•

When When

the the

stack stack

is is

LR LR

completely, completely,

use use

parser parser

do do

discussed discussed

language language

denoted denoted

we we

section section

go go action action

the the

uses uses

action. action.

or or

and and

parsing parsing

is is

not not

an an

an an

input input

to to

component. component.

use use

parser parser

The The

LR LR

the the

'error'. 'error'.

only only

state state

object object

non-terminal. non-terminal.

generate generate

length(L), length(L),

object object

tree tree

'(shift '(shift

the the

4.7], 4.7],

by by

current current

parsers parsers

symbol, symbol,

any any

parser parser

LR-P LR-P

in in

starts, starts,

one one

the the

state', state',

algorithm algorithm

an an

functions functions

the the

0 0

more. more.

of of

After After

a a

state')' state')'

empty empty

action action

ARSE ARSE

stack:, stack:,

parse parse

bit bit

has has

state state

calls calls

type type

next next

run­

The The

and and

and and

the the

the the

to to

5 5

a a a a 6

push(start-state. parser.stack) symbol, sentence := head(sentence ), tail(sentence) while true do state := top(parser.stack) if3action E ACTION(state, symbol) then if action = (shift state') then push(state', parser.stack) symbol, sentence:= head(sentence), tail(sentence) elseif action = (reduce A ::= j3) then for 1 · · · length(j3) do pop(parser.stack) od state':= top(parser.stack) push(GOTO(state', A), parser.stack) elseif action= (accept) then return true fi else return false

od

3.2. (Pseudo-)parallel LR parsing The parallel LR parsing algorithm starts as a simple (conventional) LR parser, but splits up in multiple parsers when ACTION(state, symbol) returns multiple actions. All simple LR parsers are synchronized on their shift actions in such a way, that only when all parsers have shifted on the current input symbol, the next symbol is processed PAR-PARSE: Parse a sentence with (pseudo-)parallel running LR parsers. Input: A start state start-state and a sentence sentence. Start-state is the state in which the first simple LR parser starts, and sentence is a list of tenninals terminated by the end-marker $. Output: 'true' if sentence is grammatically correct, 'false' otherwise. Description: The synchronization on shift actions is expressed in the algorithm by using two pools of parsers, this-sweep and next-sweep. The parsers in this-sweep still have to shift on the current input symbol, the parsers in next-sweep are waiting for the next symbol. Only when this-sweep is empty (and if there are parsers left in next-sweep), the next symbol is read from the input sentence. When both pools of parsers are empty, this means that all parsers died in an accepting or rejecting configuration. PAR-PARSE accepts its input if at least one simple parser accepts it. For each input symbol parsers are taken from this-sweep, until this-sweep is empty. The parser that is taken from this-sweep is removed from it, because for each action a copy of the parser is made and the action is perfonned on this copy. So, when there are no actions to be perfonned, the parser just disap­ pears. Shift actions place the copy in the next-sweep pool, reduce actions put the copy back in this­ sweep. Accept actions set the variable accepted to indicate that a simple parser has accepted the input. It is important for the lazy parser generator that the implementation of the copy operation for parsers is such that the parse stacks become different objects which share the states on them. Algorithm: PAR-PARSE(start-state, sentence): accepted := false start-parser := new(LRparser) push (start-state, start-parser .stack) next-sweep:= {start-parser} while next-sweep :;:.0 do " symbol, sentence:= head(sentence), tail(sentence) this-sweep, next-sweep:= next-sweep, 0 while 3parser E this-sweep do this-sweep := this-sweep - {parser} tion The extension. PAR-PARSE erators.

larger

parsing 4.

built which

represents thus labeled mar,

4.2

A

table

move along the

set

CONVENTIONAL

shows

stacked

kernel

transitions

describes

parse

the

its by

generator.

of

The

Each

We

parser

the

algorithm,

parse

edges

the

•tabeled

notion

items

The

parser

rule. Each

transition. symbol

often

lazy

the

table

a

table

As

path

generator.

box

state

an

control

kernel

is

table

moves how

is

transition

of

far

parse

edge underlying

in

controlled

generated

od

return

speak

generation

is

with

an

of

in

But,

of

the

as

that

a

a

this

and

states,

the

object

The

od

the

parse

field

structure

terminal

labeled

the

of

table

graph.

a

as

of

state/set

This

control

for

actions

its

state

symbol.

od

graph

accepted

parser,

a parser

PARSER

transition

in

parsing

one

a

of

table

and

with by

graph

generation

both

parser':=

by

elseif

if

elseif

'V

fi

parser

the

algorithm

internal

a

with

action

:=

can

action

During

an

the

from

of

start-state

next

set

push(state',

structure

push(GOTO(state',

accepted state':=

for next-sweep

this-sweep:=

and

the

transitions

:=

can

of

top(parser.stack)

the

when

LR

Transitions

of

algorithm

action

action

Fig.

transition

argue

a

ACTION(state.

of

($

GENERATION

items.

generator,

1

following

each

which

item

a

parse

tenninal

be

=

parse

structure e

·

accept)

move parsing,

items

copy(parser)

·

algorithm

4.l(c)

we

parsing

(shift

actions

seen

·

top(parser'.stack)

that the

can

=

=(accept)

length(p)

sets.

state

:=

table

and

ACTION

describe

The

table

(reduce

:=

field

parser'.stack)

itself

true

is

forward

contains

as

state')

be

have

symbol;

is

this-sweep

is

fields:

next-sweep

while

a is

is

the the

the

a program

do

dots

and

a

generated

shift

a

generator

generated

equivalent

of

a

discussed

special

is

set

sentence

parser

symbol)

'directed the results

A

do

A),

then

then

a

the

in

concerned,

and

along

'

in

the

action;

::=

set

a

of

pop(parser'.stack)

form

this

'

parser'.stack)

the

parsing

reduce

indicate

GOTO

v

p)

case,

items.

rules

of

v

moves

running

of

by

is

an

parse

section

in

LR

{parser'}

then

to

(symbol

'true

graph

items

{parser'}

if a

ACTION

a

edge

section

a

tabular

the

that

it

parser

action case,

states.

there

receive

set

The

tables

how

through

is

or

accept

on

of

contains

labeled is

of

a

are

false'.

itemset),

we item

non-terminal

is

an

arrows

5

the

generator.

first

items.

representation

far

Fig.

and

potentially

are

will

their

no

LR-parsing

actually

action.

od

conventional

the

the

sets'.

causes

with

difference

interpreted

4.1

GOTO.

an

tum

information

The

between

parser

graph:

with

edge

gives

a

Each

out

ought

a

non-tenninal graph

the

being

itemset

move

machine.

to Start-state

has

to

a

of

an

between

row

sets

transition

by

shift

LR(O)

another

be

an

to

of example

progressed

too.

recognized

back

a

a

internal

speak

in

these

a

of

hard-wired

straightforward

action

set

the

The

items

algorithm,

the

symbol.

according

set

is

of

item

is

parse

of

of

part

next

two

items.

structure

causes

a

of

a

a

in

are

GOTO

by

sets

gram­

parse

items

table

each

gen­

of

sec­

LR­

Fig

the

the

of

to

is

If

7

a

a 8

no. rule

0 B ::=true 1 B ::=false B~ true false 2 B ::= B orB 3 B ::= B andB 4 START::=B \n/=*·j~,?l.]

...... ~-=.-~----...~ true true false false ACTION GOTO true false or and $ B state 0 s2 s3 1 accept 1 s5 s4 ace 2 rO rO rO rO rO 3 rl rl rl rl rl 4 s2 s3 6 5 s2 s3 7 6 r3 r3 s5/r3 s4/r3 r3 7 r2 r2 s5/r2 s4/r2 r2

Fig. 4.1: (a) Grammar of the Booleans, (b) parse table, (c) graph of item sets.

B ::=B • andB B ::= B • or B B ::= B • or B Fig. 4.2: The parsing of 'true or false'. • reductions The reductions field of a set of items contains the syntax rules that have been recognized com­ pletely in this state/set of items. These rules may be reduced by the parser. In diagrams we indi­ cate reductions by underlining a rule in the corresponding kernel field (reductions of rules which are not also in the kernel field can not be represented in these diagrams). • type The value of the type field of a set of items may be 'initial' or 'complete'. If it is 'initial', the fields transitions and reductions have not yet been computed. In diagrams we indicate 'complete' sets of items with a black circle and 'initial' sets of items with an open circle. The number in the circle serves as a unique reference to each set of items. The parse table in Fig 4.l(b) is a tabular representation of the graph of item sets of Fig 4.l(c). This representation is normally used for an LR parse table. It contains the results of the functions ACTION(state, symbol) and GOTO(state, symbol) with a row for each state and a column for each sym­ bol. We shall not use these parse tables further, because the lazy parser generator also needs the kernel field Qf each set of items during parsing. The correspondence between the behaviour of ACTION and GOTO for a given state/set of items and the transitions and reductions fields is described by the following algorithms.

Description: Description:

Input: Input:

EXP EXP

Algorithm: Algorithm:

Output: Output:

Description: Description:

GENERATE-PARSER: GENERATE-PARSER:

Input: Input:

the the

Algorithm: Algorithm:

ACTION: ACTION:

GOTO: GOTO: Output: Output:

Description: Description:

Input: Input:

Algorithm: Algorithm:

Description: Description:

Output: Output:

Input: Input:

SURE SURE

On On

itemset. itemset.

transforms transforms ltemsets ltemsets

mar. mar.

symbol symbol is is

grammar grammar

add add

itemset itemset

are are

mar. mar.

zero zero

that that

tions tions

tions tions

AND: AND:

composed composed

The The

A A

A A

A A

A A

shifting shifting

expanded, expanded,

initial initial

The The

there there

The The

The The

set set

or or

state state

START START

grammar grammar

state state

field. field.

ACTION ACTION

and and

to to

Transform Transform

graph graph

it it

of of

more more

of of

This This

GENERATE-PARSER( GENERATE-PARSER(

contains contains

state state

This This

GOTO(state, GOTO(state,

ACTION(state, ACTION(state,

new new

actions actions

The The

This This

The The

generate generate

by by

returns returns

reductions reductions

is is

sets sets

state state

the the

state state

items items

an an

S S

Because Because

exactly exactly

may may

return return

start-itemset.type start-itemset.type ltemsets ltemsets

start-itemset start-itemset

od od

start-itemset.kernel start-itemset.kernel

while while

the the

return return

return return

result result

of of

(or (or

terminals terminals

of of

extended extended

state state

argument argument

argument argument routine routine

in in

right-hand right-hand

initial initial

routine routine

of of

as as

Grammar, Grammar,

and and

and and

all all and and

EXP EXP

routine routine

item item

which which

the the

itemset itemset

reducing reducing

not not

is is

items items

well well

all all

an an

3itemset 3itemset

an an

for for

rules rules

:= :=

GOTO GOTO

Build Build

the the

start-itemset start-itemset

a a

one one

state'[(symbol state'[(symbol

a a

result result

parser parser

AND(itemset) AND(itemset)

we we

initial initial

set set

fields. fields.

sets sets

be be

:= :=

computes computes

sets sets

non-terminal non-terminal extension extension

tenninal tenninal

{(accept) {(accept)

{(reduce {(reduce

{(shift {(shift

the the

generates generates

symbol): symbol):

as as

parsing parsing

kernel kernel

to to

state state

and/or and/or

GENERATE-PARSER: GENERATE-PARSER:

state state

used used

state state

transition transition

{start-itemset} {start-itemset}

of of

assume assume

with with

in in

symbol): symbol): side. side.

of of

:= :=

a a

for for

ltemsets. ltemsets.

parser parser

from from

which which

to to

can can

can can

items items

set set

graph graph

Grammar Grammar

new(itemset) new(itemset)

e e

items items

in in

S), S),

in in

is is

is is

state') state')

searching searching

type type

is is

:= :=

ltemsets[itemset.type ltemsets[itemset.type

of of

non-terminals. non-terminals.

symbol symbol

access access

must must

perform perform

which which

Grammar): Grammar):

the the a a

A A

a a

the the

which which

I I

of of

the the

the the

:={START::= :={START::=

partitioned partitioned

after after

initial initial

a a

is is

items items

into into

complete complete

complete complete

(symbol (symbol

of of

::= ::=

for for

created created

'initial'. 'initial'.

state') state')

graph graph

the the

symbol. symbol.

right-hand right-hand

transitions transitions

a a

graph graph

parser parser

item item

start. start.

I I

set set

jJ) jJ)

symbol symbol

other other a a

reducing reducing

with with

(symbol (symbol

parsing parsing

symbol. symbol.

ACTION ACTION

kernel kernel

into into

complete complete

in in

for for

I I

of of

sets sets

of of

e e

A A

accept) accept)

during during

state. state.

of of

will will

START START

set set

set set

sets sets

complete complete a

sets sets

syntax syntax

in in

state.transitions] state.transitions]

item item

::= ::=

item item

in in

for for

containing containing

1 1

side side

starts starts The The

of of

of of

subsets subsets

and and

state') state')

.p .p

a a

have have

state.transitions. state.transitions.

p p

of of

of of

and and

rule rule

a a

items, items,

items, items,

the the

one. one.

sets sets

as as

sets sets

I I

e e

e e

of of

items items

grammar. grammar.

non-terminal non-terminal

reductions reductions

rules rules

items items

START::= START::=

=initial] =initial]

and and

state.reductions} state.reductions}

state.transitions} state.transitions}

GOTO GOTO

advanced advanced

left-hand left-hand

generation generation

that that a a

e e

for for

of of

to to

set set

While While

syntax syntax

and and

and and

state.transitions} state.transitions}

is is

A A

in in

have have

all all

rules rules

that that

delivered delivered

the the

of of

the the

::=a, ::=a,

this this

the the

the the

obtain obtain

rules rules

items. items.

do do

are are

fields fields

p p

expanding expanding

given given

rule. rule.

been been

side, side,

root root

one one

having having

graph. graph.

return return

return return

START START

e e

process. process.

with with

not not

that that

Grammar} Grammar}

step step

their their

of of

symbol symbol

generated generated

with with

of of

grammar. grammar.

yet yet

the the

u u

the the

value value

A A

value value

may may

itemset. itemset.

The The

is is

recognizing recognizing

information information

a a

the the

a a

complete. complete.

the the

u u

graph graph

It It

same same

non-terminal non-terminal

set set

in in

become become

kernel kernel

is is

is is

is is

dot dot

symbol symbol start

state state

correctly, correctly,

of of

used used

deduced deduced

deduced deduced

It It

The The

symbol symbol

of of

placed placed

items, items,

starts starts

field field

state. state.

item item

a a

when when

applicable applicable

set set

Routine Routine

is is

rule rule

generated generated

of of

by by

S S

from from

before before

sets sets

from from

we we

of of

EXPAND EXPAND

and and

after after

sets sets

in in

of of

start-itemset start-itemset

using using

items items

can can

for for

a a

the the

the the

EXPAND EXPAND

its its

its its

of of

the the the the

a a

in in

be be

Gram­

subset subset

transi­

transi­

CW­

gram­

list list

items items

start­

state state

from from

may may

dot. dot.

sure sure

first first

of of 9 9 10

associated with S. For each S the associated subset is transformed into a new kernel kernel' by moving the dot over the S. When a set of items itemset' with kernel kernel' does not yet exist, it is generated as an initial set of items. The transition (S itemset') is now added to itemset.transitions. A rule in the extented kernel having its dot at the end has been recognized completely. It depends on the left-hand side of the rule if this means an accept or a reduce action. Algorithm: EXPAND(itemset): cl-items := CWSURE(itemset.kernel) itemset.transitions, itemset.reductions := 0, 0 for "i/S e {S I A ::= a.sp e cl-items} do

kernel':= {A ::=aS .~IA ::=a.S~e cl-items} if3itemset' e ltemsets[itemset'.kernel = kerne/1 then itemset.transitions := itemset.transitions u { (S itemset')} else itemset' := new(itemset) itemset'.type, itemset'.kernel :=initial, kernel' ltemsets := ltemsets u { itemset'} itemset.transitions := itemset.transitions u { (S itemset')}

od for 'VA ::= p. e cl-items do if A =START then itemset.transitions := itemset.transitions u { ($ accept)} else itemset.reductions := itemset.reductions u {A ::= p}

od itemset.type := complete

CWSURE: Input: A set of dotted rules kernel. Output: The same set of dotted rules, extended with all rules that can also become applicable. Description: CLOSURE extends a given kernel with all rules that may become applicable. If there is a rule A ::= a.Bp in the kernel it means that non-terminal B may become applicable. Hence, the kernel can be extended with all rules B ::= .y, because when a B can be recognized, all rules that derive B can also be recognized. Algorithm: CWSURE(kernel): closure := kernel while 3 A, B, a, p,y [A ::=a.BP e closure A B ::= r E Grammar A

B ::= •Y ~closure]do closure:= closure u {B ::= •"(} od return closure

5. LAZY PARSER GENERATION The parser generation algorithm described in the previous section generates the parser completely before it is use

S.1. An algorithm for lazy parser generation We only have to adjust the LR(O) parser generator of the previous section a little to transform it to a lazy parser generator. We move the parser generation phase into the parsing phase by moving the expansion of initial sets of items from GENERATE-PARSER to ACTION. This means that the state with which ACTION is called, cannot only be a complete set of items but also an initial one. When it is still initial, it is expanded first by EXPAND. GENERATE-PARSER now only generates start-itemset as an initial set of items, the rest of the parser generation will be taken care of by ACTION. GENERATE-PARSER: Build the first part of a graph of item sets from a grammar. Input: A grammar Grammar, which is a set of syntax rules A ::=ex. Output: The state in which parsing must start. Description: This routine constructs the set of items start-itemset: the root of the graph of item sets for the given grammar. ACTION and GOTO can access other sets of items in this graph. The kernel field of start-itemset is composed of all rules in Grammar with STARTas left-hand side, with the dot placed before the first symbol of the right-hand side. Algorithm: GENERATE-PARSER(Grammar): start-itemset := new(itemset) start-itemset.type := initial start-itemset.kernel := (START::= .13I START::= 13e Grammar} Itemsets := (start-itemset} return start-itemset

ACTION: Input: A state state and a terminal symbol. Output: The actions the parser can perform in state. Description: When state is an initial set of items it must be expanded first. The complete set of items is then used to deduce the return value from the transitions and reductions fields. Algorithm: ACTION(state, symbol): if state .type = initial then EXPAND(state)

result := {(reduce A ::= 13)I A ::= 13e state.reductions} u {(shift state') I (symbol state') E state.transitions} u {(accept) I (symbol accept) e state.transitions} return result

Like ACTION, GOTO uses information that is only available in complete sets of items, so one might be inclined to think that the same test for initial sets of items has to be added to GOTO as well. However, due to the the particular way in which the parsing algorithm works, GOTO will only be called with sets of items that have already been completed. This is proved in Appendix A. The implementation of the lazy parser generator has to treat variables Itemsets and Grammar of GENERATE-PARSER as global variables, because they are needed during the expansion of sets of items. Furthermore, several complete sets of items can point to the same initial set of items. When expanding an initial set of items, the implementation has to take care that all sets of items that originally pointed to the initial set of items now point to the completed one.

the the

The The

again again

lazy lazy

some some

needed needed

parse parse distributed distributed

has has

generation generation

grammar grammar

5.3. 5.3.

again. again.

graph graph

start-state. start-state.

oarsed. oarsed.

graph graph

5.2. 5.2.

When When

generator generator Consider Consider

12 12

kernels kernels

to to

parser parser

overhead overhead

The The

An An

extra extra

when when

We We

In In

the the

All All

of of

shown shown

be be

the the

The The

during during

example example

item item

contrast contrast

done done

cost cost

like like

the the

SDF SDF

sentences sentences

initially initially

considered considered

parser parser

technique technique

over over

time. time.

generator generator

when when

advantage advantage

the the

~ ~

Hence, Hence,

Fig. Fig.

Fig. Fig.

in in

grammar grammar

sets. sets.

that that

of of

in in

as as

parsing, parsing,

definition definition

grammar grammar

Fig. Fig.

parsing, parsing,

lazy lazy

time time

all all

5.2: 5.2:

before. before.

is is

5.1: 5.1:

to to

of of

consists consists

of of

Only Only

that that

ACTION ACTION

5.l(b). 5.l(b).

given given

sets sets

lazy lazy

clearly clearly

the the

uses uses

SDF SDF

making making

The The

parser parser

introduced introduced

of of

The The

of of

the the

only only

of of

the the

is is

will will

accept accept

conventional conventional

of of

for for

parser parser

Only Only

the the

more more

graph graph

(given (given

its its

only only

graph graph

modified. modified.

items items

lazy lazy

Fig. Fig.

SDF SDF

has has

lazy lazy

sentences sentences

generation generation

contain contain

not not

the the

booleans booleans

first first

is is

the the

memory memory

of of

of of

advantages. advantages.

parser parser

called called

5.2 5.2

....,,._....._-=r--="-"'i.-' ....,,._....._-=r--="-"'i.-'

of of

generation generation

technique technique

lazy lazy

by by

have have

in in

increase, increase,

itself itself

B B

B B the the

B B

sentence, sentence,

item item

test test

item item

shows shows

appendix appendix

::=Band ::=Band

::= ::=

::=B ::=B

this this

'and' 'and'

start start

parser parser

been been

containing containing

technique, technique,

in in

of of

with with

B B

generator generator

(see (see

sets sets

than than

sets sets

lazy lazy

•and •and

• •

ACTION ACTION

Fig. Fig.

set set

orB orB

the the

since since

and and

is is

expanded, expanded,

section section

after after

its its

initial initial

after after

generator generator

B B

a a

rather rather

technique technique

B) B)

of of

4.l(a). 4.l(a).

graph graph

B B

comparable comparable

• •

'true', 'true',

first first

B B

B B

items items

STAR STAR

even even

only only

the the

also also

B B

(a) (a)

'false' 'false'

true true

where where

I I

~ ~

::=B ::=B

::=B ::=B

which which

set set

B,'=~· B,'=~·

small small

7 7

step step

of of

sentence sentence

generation, generation,

will will

60 60

for for

in in

The The

needs needs

(with (with

but but

even even

of of

1 1

item item

• •

• •

is is

::= ::=

true true

the the

or or

will will

only only

orB orB

andB andB

percent percent

determines determines

items items

for for

all all

small. small.

the the

now now

graph graph

conventional conventional

B B

'or', 'or',

sets sets

type type

more more

the the

worst worst

measurements). measurements).

the the

• •

false false

be be

incremental incremental

'true 'true

the the

be be

false false

start-state start-state

B B

after after

kernel kernel

the the

I I

(b) (b)

grammar grammar

initial) initial)

~ ~

to to

of of

of of

The The

1} 1}

lazy lazy

"""Cld. """Cld.

parsed parsed

ACTION ACTION

case case

and and

item item

ask ask

the the

graph graph

the the

the the

the the

total total

than than

true true

true' true'

fields fields

,,ill,.1 ,,ill,.1

exactly exactly

parse parse

first first

what what

shown shown

one. one.

type type

sets sets

sentence sentence

without without

parser parser

of of

of of

which which

generation generation

it it

call call

and and

has has

the the

item item

of of

generated generated

of of

One One

already already

table table

actions actions

In In

false false

in in

the the

a a

I I

been been

the the

to to

booleans, booleans,

generator generator

GOTO GOTO

this this

[;,,ill,.1 [;,,ill,.1

Fig. Fig.

further further

is is

given given

sets sets

could could

'true 'true

ACTION. ACTION.

had had

same same

sets sets

expanded expanded

case case

it it

parsed. parsed.

is: is:

5.l(a). 5.l(a).

time, time,

has has

and and

to to

has has

by by

set set

decide decide

of of

information information

it it

expansion expansion

amount amount

be be

to to

the the

but but

will will

is is

the the

items. items.

of of

true' true'

which which

to to

be be

generated generated

unnecessary unnecessary

items items

lazy lazy

for for

perform perform

lazy lazy

first first

need need

to to

expanded expanded

has has

of of

a a

remove remove

is is

So So

parser parser

parser parser

larger larger

of of

to to

takes takes

work work

them them

been been

now now

was was

the the

the the

the the

to to in in The

actions overhead to depends the is the has 6.

item both

chances

of

is of for

grammar reasons

Transitions hne. Unfortunately, changed.

severe

already

addition tences

grammar,

accept

shown

expand modified,

its

the

INCREMENTAL

kernel?)

the

conventional

already

lazy

grammars

sets

graph

In

We

new

Consider,

old

implications

for

'a

to

are

been this

on

in

parser

of

that

incurred

with

an first

b'

there

believe

one. grammar.

the

Fig generated

of

that

the

the

have it

turned section

entire

and

can

generated.

is

show the item

specific

will

6.1.

'size'

rule

is

generator

large

Is

for

likely

method,

been

still

'c

rule

(For

no

o Consider, so.

the

set

sets. out

have

than that instance,

we

A

b',

guarantee

parts

of

and

When

PARSER

be

of

'B

old ::=

that

what

added

to

symbol

describe

Fig.

but

extension

the

large

Suppose

in

items

this

it used

This

be

::=unknown'.

B

b

can

by

re true true

In,,.$.

graph

IFA~--~1

of

is a

the

to

6.1:

it

symbols

modification,

so

the

restarting

relatively

is

the

it

still to

parts

only

is

in the grammar. the

grammar

at

large

previous that

not

have with

graph

GENERATION

an

some

(a)

true

of

grammar

for

the

the

once,

a

rather

of

always

incremental react

rule

item

the in

The

instance,

which

that

to

a graph

has

smallest

false

common.

sets

of

grammar

from

large

original graph

since

be

is

false

is original wasteful.

sets

The case.

the

to

no

item

1~3L but

added

removed.

of

the

of

a

for

ACTION

modifications

net

scratch.

false

set

part

in

complicated only

graph

This

Fig.

also

items,

grammar

the

case.

sets

We

the

some

parser

gain

of

does

graph,

to

of

6.2(a)

grammar that

example

on

items

We

modified

is

show

a

of

it

in

but

grammar.

already Although

was

not

sense

is

how

item

part

generator

stays

would

efficiency

(b)

a

we

for

existing

already

in

correspond

I

subgraph

of

way called.

has

much

sets

the

shows

a

Fig

of

which could

the

grammar.

the

highly

subgraph

like

the

to

update

to

the

for

6.3

Oearly,

same,

that

been

grammar

be of

transitions

was

describe

However,

to

think

booleans

that the

lazy

the

of

the

the

expanded

specialized

retains

exploit

in

exp.~ed?

the to

of

graph

and

booleans

even

of

How

modifications

a

graph

technique

of

the

be

the

ef

straightforward

~

graph

the

by

that unknown unknown

a

for

expected.

those of

old

graph. the

or

if of

the much

language

new

throwing

had

that

Fig.

rules

which

the

kernels item

for

grammar

fact

towards

has

additional

parts What

is

one

already is

has

graphs

the

4.l(a),

now

sets

that when

an

are

needed

of

an

as

with

modified

to

of

did

improvement

away

was

only the

the

of

way

to

update

is

well?

be

the

of

and been

administrative

not

Fig.

be a

graph

the

only

old

to

thrown

item

added

subgrammar

the

old

to

updated,

deduce

extend

a

have

closure

6.2(b) generated There

extension

grammar. grammar,

grammar

has

parser

~ graph

sets

after

to

away

to

more

over

sen­

this

the

are has

for

13

the

of

of

be

as an

it 14

The

6.1. initial, used

will ing items which

there added re-expanded,

A sitions should to

ADD-RULE: transition in

expand

re-expanded.

Des'iription: DELETE-RULE Input:

Algorithm:

::=

the

the

the

then, Add

incremental

Grammar

An

in

was

would Suppose

.~ Similarly

These

transitions

to recognition

that The

field.

exisisting have

closure

A::=b

the

E::=cC

C::=B

START::=E B::=b D::=

START::=D A::=B

algorithm

them

syntax

the

in

when

(A a

were

rule

graph

rule

the

A

contain

ADD-RULE( closure

sets

MODIFY

itemset')

aA Add but

This

when

::=

of a

rule

needed,

rule

when closure

with

(from

MODIFY(rule,

parser

use

rule

of

complete

field every

graph

of

a of

.~in

for

can

items,

rule

A that

when

the

item

needed.

its

the

A

we

in

the

of :

is

incremental

:= generator

re-expand

kernel?

dot before

of

the ::=~is

be same

in

has to

their

new

called

delete

the

rule):

sets

viewpoint

Fig.

which

that

their

the ones

~· which

achieved

closure

Fig.

to

set

rule

transitions

routine

for

Because

How grammar

6.3:

be

closure

with

added

v) a

kernel. present

Initial

6.2:

of

are

retains

added.

rule

the

these

should

recognition

an

The

of

items

can

parser

of

operator

(a)A

the

simply

A

MODIFY

new

their

A

to

sets

contains

the

modification

in

addition

sets

we a

and

These

first field.

::= only

the

start.

in

problem.

grammar,

the

grammar.

new

of

kernel

find

generation

of

~ question.

update

by

grammar.

ones

'v'

closure, items

that

from

of

items

are, to

grammar)

at

In

these

making

and

to

this update

least by

part

affected

the

the

the

can

similar

Fortunately,

perform (b)

b

of

according

fn~-

the

It

deletion

sets

EXP

new

rule

So

of

corresponding

the

grammar,

one

its

easily

does

We

them

presence

the

the

expanded

we

of

AND

graph

graph

by

graph

started.

to

dotted

then

the graph

items

this

can

(possibly

the be

addition, initial

of

to

J

must

actual

we

of

of

dealt

have

the by

the recognize we

a modification

of

rule

in

of

item

Fig.

incorrectly.

These rule

can

making

and

a

new the closure

have

already

item

graph

with

update.

transition to

with

incomplete)

the

6.2(b

sets. be

existing

are

let

find

grammar.

sets.

to

are

sure because

complete

of all its

the all

of

). find

have so

the

item

of dot

complete

the

the

sets

that The

similar,

(A

lazy

graph

the

the

states

added

before

sets

sets.

kernel

graph

they

itemse()

of

A

lazy

grammar, states

sets

parser

::=

items

without

of

(sets

do

sets

a

ADD-RULE

parser

an

that

of

of

.~will

transition

items

(sets

not

A.

generator

in their

of

these items

in

of

can

have recomput­

have

items the

But generator

items)

of

that

only

still

sets

with

items)

graph

when

for

to

tran­

to

that

had

and

be

re­

be be

of

be

in

A

a DELETE-RULE:

Input:

Description: MODIFY:

Algorithm:

Input: A

Description:

Algorithm:

tial,

When,

transformed

item

established,

When

A

again.

(A modified When

according

re-expand

because

sets

::=

1be

items

the

for

rule

-~in

for

Modify

A

rule

example,

lazy

et)

DELETE-RULE(rule

MODIFY

MODIFY(A

and

into

MODIFY

A

grammar.

the

is

them

to

they

Fig.

rule its

::=~and

in

MODIFY(rule.

Grammar

if else

the

fi

parser

the

grammar

the

Delete

the

kernel.

A

their

a

6.4:

accept

when

had

that

for

start-itemset.kernel start-itemset.type

od

grammar

start-symbol

initial

=

unconnected

modification

the

is

START

uses

transitions

Graph

generator Vitemset

itemset.type

has

a

This

a

called

rule

::=

a

needed

rule

When

of

modification

transition :=Grammar

set

to

global

~.

B B B

Fig.

is

then

'B

and

of

from

of

be an

::= ::=

::=

[itemset.type

with

D

) -

(A

done

it

items ::=unknown'

item

by

):

deleted.

E

B B START B

update

graph

6.l(a)

):

now

and

field.

is

itemsef)

6

variables

the

andB

Itemsets

operator'-'

:=

the

not,

or

andB

for

:=

by

sets

the

8

initial

expands

grammar

B B

operator

:=

parser.

of

initial

is

D

These

is

the

making

of

we

'B'

start-itemset.kernel

for

graph

Fig.

updated

generated

{A

=

the

E

corresponding

search

Grammar,

~

the

in

complete

itemset.transitions]

is

::= sets

6.4.

0

::=

to

grammar,

and

their

of

D

all

boo

added

~}

again,

perform

by

B

of

Itemsets

item

(which

incorrectly

update with

leans

items

MODIFY,

or

transitions

A

to

Itemsets,

sets

B

its

kernel

after

the

we graph

may

the

are

the

for

former

is

D

grammar

know

actual

made

corresponding

addition

expanded

all reduced

be

'B

the

of

{A

and

do

field.

complete

'u'

::=unknown

item

connections

sets

::=

initial

update.

that

start-itemset.

or'-').

of

of

.~}

to

of

sets.

The

sets

only

'B

the

items

a

to

sets

graph

:

let

graph

graph

:=

of

booleans,

start-itemset

of

with

unknown'.

the

0,

items

'.

items

that

4,

of

The

Grammar

lazy

of

and

I,

item

in

item

is

resulting

and

2,

parser

with

5

Itemsets

correct

sets.

and

are

the

can

sets

a

is

generator

transition

made

3

graph

updated

graph

contain

for

is

are

initial

thus

ini­

the

15

re­

of

is

stand stand

added added set set

erating erating of of

it it

sible sible

used used words, words, new new

refers. refers.

Furthermore, Furthermore,

have have

stay stay

to to parser. parser. the the

There There

6.2. 6.2. expanded, expanded, A A

by by When When too too

it it

retaining retaining

to to

to to

affected affected

shown shown

16 16

has has

takes takes

compromise compromise

an an

the the

it. it.

items items

definition, definition,

of of

much much

modified modified

forever forever

rule rule

Garbage Garbage

again. again.

to to

part part

initial initial

and and

been been

Routine Routine

The The

items items

The The

MODIFY MODIFY

grammar grammar

to to

MODIFY MODIFY

is is

in in

the the

and and

}Vhen }Vhen

be be

This This

time, time,

by by

yet yet

the the

that that

is is

unused unused

Fig. Fig.

more more

of of

is is

dilemma dilemma

example example

the the

regenerated regenerated

expanded expanded

removal removal

discarded discarded

correcting correcting

this this

set set

with with

in in

thrown thrown

another another

the the

grammar grammar

would would

MODIFY MODIFY

while while

initial initial

grammar, grammar, will will

6.5. 6.5.

transitions transitions

the the

EXPAND EXPAND

collection collection

Itemsets. Itemsets.

solution solution

of of

robust. robust.

of of

modification. modification.

is is

graph graph

makes makes

sets sets

a a

items items

reference reference

the the not

almost almost

Fig. Fig.

history history

accept accept

of of

there there

be be

of of

Fig. Fig.

sets sets

is is

away, away,

problem problem

the the

of of

again). again).

of of

a a

Fig. Fig.

thus: thus:

of of

unused unused

in in

(this (this

6.l(a), 6.l(a),

should should

a a

is is

items items

with with

graph graph

new new

of of

6.5: 6.5:

sets sets

ref ref

the the

to to

set set

is is

item item

contradiction contradiction

best best

certainly certainly

to to

(its (its

but but

6.2 6.2

items items

count count

8 8

no no

B B

B B

B B

would would

when when

attach attach

count count

of of

booleans, booleans,

kernel kernel

The The

and and

references references

and and

to to

Frequent Frequent

an an

in in

::=B ::=B

::= ::=

set set

::=B ::=B

of of

old old

possible possible

make make

sets sets

sets. sets.

when when and and

guarantee guarantee

items items

ltemsets ltemsets

be be

item item

B B

do do

graph graph

increments increments

field field

of of

9 9

transitions transitions

all all

6 6

of of

to to

occur occur

6.3 6.3

of of

• •

andB andB

• •

never never

solved solved

will will

{ {

In In

items items

not not

sets sets

or or

andB andB

everything everything

B B

initial, initial,

each each

unreachable unreachable

a a

sets sets

items items

will will

sets sets

processed processed is

B B B B

particular, particular,

algorithm algorithm

::=b ::=b

modification modification

with with

of of

set set

have have

be be

to to

of of

the the in

is is

• •

that that

be be

are are

Fig. Fig.

3 3

set set

in in

have have

reinstated, reinstated,

of of

of of

essential, essential,

items items

these these

is is

the the

is is

•. •.

B B

the the

lb@. lb@.

~ ~

the the

used used

field). field).

the the

a a

the the

items items

of of

clearly clearly

items items

postponed postponed

made made

6.4 6.4

A A

example example

transitions transitions

transitions transitions

::= ::=

is is

lazy lazy

been been

items items

ref ref

incremental incremental

in in

re-expanded re-expanded

::=b. ::=b.

'dirty' 'dirty'

sets sets

partial partial

sets sets

retained, retained,

again. again.

B B

after after

true true

correctly correctly

ount ount

the the

l, l,

of of

It It

becomes becomes

initial, initial,

philosophy. philosophy.

• •

otherwise otherwise

decreased decreased

separated. separated.

but but

6, 6,

may may

or or

a a

is is

of of

a a

sense sense

} }

ref ref

re-expansion re-expansion

of of

rather rather

fields fields

. .

and and

B B

expanded expanded

grammar grammar

re-expansion re-expansion

until until

the the

items items

For For

field). field).

Set Set

false false

of of

count count

Fig. Fig.

because because

(but (but

we we

I I

by by

7 7

that that

transition transition

parser parser

that that

~,,~,,~~~j ~,,~,,~~~j

zero, zero,

of of

example, example,

sets sets

the the

of of

than than

will will

up up end

major major

6.4). 6.4).

of of

are are

MODIFY. MODIFY.

need need

This This

field, field,

items items

unknown unknown

When When

set set

the the

Also, Also,

it it

chance chance

those those

can can

of of

in in

never never

initial. initial.

it it

removed removed

it it

does does

generator, generator,

of of

sets sets

On On

of of

items items

makes makes

parts parts

the the

is is

has has

not) not)

telling telling

cause cause

of of

7 7

with with

to to

items items

in in

the the

set set

sets sets

when when

removed removed

and and

the the

not not

same same

be be

sets sets

7 7

of of

a a

is is

MODIFY MODIFY

A A

be be

of of

When When

of of

will will

on on

transition transition

set set

too too

re-used re-used

better better

items items

the the

of of

other other

the the

many many

always always

immediately, immediately,

how how

dirty dirty

to to

the the

of of

items items

the the

created. created.

b b

namely namely

of of

way way

items items

much much

others others

ever ever

algorithm algorithm

transition transition

is is

items items

graph graph

items items

the the

from from

many many

hand, hand,

rule rule

it it

set set

replaced replaced

that that

useless useless

as as

0. 0.

(unless, (unless,

the the

retain retain

creates creates

be be

to to

rule rule

for for

grubage grubage

of of

an an

disappear disappear

is is

grubage grubage

'B 'B

they they

ltemsets. ltemsets.

of of

which which

is is

On On

functions functions

sets sets

there there

used used

items items

A. A.

avoided, avoided,

initial initial

A A

re-expanded re-expanded

of of

easier easier

item item

::= ::=

sets sets

the the

it it

by by

the the

transitions transitions

will will

of of

When When

::=bis ::=bis

of of

2 2

is is

again again

are are

B B

it it

largest largest

in in

to to

is is

a a

of of

course, course,

items items

sets sets

collection collection

likely likely

set set

one one

no no

xor xor

transition transition

(because, (because,

to to

never never

ltemsets. ltemsets.

an an

7 7

also also

In In

items items

for for

because because

3 3

are are

longer longer

under­

by by

would would

added added

initial initial

hand, hand,

After After

other other

B' B'

is is

refer refer

gen­

pos­

that that sets sets

not not

the the

the the

for for

re­

to. to.

be be

to to is is

8. 8.

Although Although

tion. tion.

low low

the the

In In

syntax-directed syntax-directed

all all

tems tems

The The

parsers parsers

in in

bage bage

The The

not not which which

each each addition addition

definitions definitions

Yacc to to

We We

18 18

• •

•PG: •PG:

• •

• •

• •

• •

• •

our our

the the

CONCLUSIONS CONCLUSIONS

be be

algorithms algorithms

grammar grammar

IPG: IPG:

parse parse

parse parse

modify modify

construct construct

print print

Yacc: Yacc:

measured measured

by by

results results

measurements measurements

and and

collections collections

case case

IPG IPG

generates generates

To To

able able

LeLisp LeLisp

adds adds

opinion, opinion,

building building

was was

parsing parsing

erating erating

erates erates

The The

erator erator

the the

In In

generation generation

has has

than than

tables, tables,

PG. PG.

environment environment

'Exam.self' 'Exam.self'

generation generation

Yacc Yacc

or or

(orinSDF: (orinSDF:

other other

it. it.

prevent prevent

the the

an an

incremental incremental

by by

uses uses

of of

to to

this this

deletion deletion

of of

the the

an an

second second

a a

input input

great great

All All

fact fact

same same

adding adding

of of

a a

that that

The The

which which

use use

the the

stream stream

the the

generates generates

environment environment

for for

parsers parsers

highly highly

generates generates

element element

twice twice

parse parse

grammar grammar

case case

whereas whereas

editor), editor),

the the

affected affected

C-code, C-code,

SDF SDF

the the

were were

measurements measurements

the the

0. 0.

7.6 7.6

1.3 1.3

the the

that that

measurements measurements

times times

of of

incremental incremental

sentence sentence

the the

second second

advantages, advantages,

sentence sentence

7 7

have have

time time

measurements measurements

parse parse

of of

incremental incremental time time

after after

the the

the the

the the

"(" "("

Yacc. Yacc.

input. input.

sec sec

sec sec

sec sec

table table

lexical lexical

as as

and and

AND AND

of of

dynamic dynamic

only only

generation generation

This This

PG PG

same same

in in

::="("+")?" ::="("+")?"

a a

in in

we we

used used

much much

parsers parsers

smallest smallest

grammar grammar

rule rule

lexical lexical

parts parts time time

which which

and and

taken taken

Yacc Yacc

been been

for for

for for

for for

C, C,

the the

the the

for for

input input

reason reason

CF-ELEM+ CF-ELEM+

always always

generates generates

priority priority

for for

allowed allowed

generation generation

FUTURE FUTURE

(SDF (SDF

consider consider

and and

while while

twice. twice.

There There

input input

scanner scanner The The

the the

the the

Y Y

roughly roughly

reconstruct reconstruct

by by

needed needed

modification modification

needed needed

a a

SDF; SDF;

of of

parse parse

ace ace

repeated repeated

generates generates

parser parser

by by

applications. applications.

as as

was was

have have

tokens tokens

sentences sentences

Yacc Yacc

that that

C C

C C

were were

the the

are are

generator generator

the the

has has

of of

that that

definition) definition)

modification modification

rule rule

takes takes

is is

PG PG PG, PG,

to to

sentences sentences

compiler compiler

compiler compiler

and and

convincingly convincingly

are are

between between

three three

LR LR

compiled compiled

time time

given given

parsers parsers

shown shown

parse parse

are are

generate generate

and and

IPG IPG

15 15

parts parts

takes takes

been been

for for

generation generation

PG PG

and and

compiled compiled

") ")

parser parser

WORK WORK

time time

indicating indicating

to to

function function

already already

on on

parse parse

less less

about about

lines lines

the the

LALR(l) LALR(l)

?" ?"

the the

constructing constructing

parser parser

as as

to to

reasons reasons

uses uses

IPG IPG

to to

table. table.

in in

input input

of of

carried carried

on on

the the

of of

consists consists

by by

to to

to to

Yacc, Yacc,

in in

parse parse

twice; twice;

be be

measurements. measurements.

time time

file file

again again

-> ->

Fig. Fig.

be be

the the

is is

and and

tables tables the the

the the

in in

top top twice twice

link link

compile compile

same same

use use

less less

the the

in in

the the

time time

an an

declarations. declarations.

by by

generators generators quite quite

unacceptably unacceptably

system system

texts texts

show show

CF-ELEM CF-ELEM

68020 68020

in in

SDF SDF

that that

parser parser

7 7

than than

the the

parse parse

memory, memory,

for for

of of

table; table;

out out

but but

excellent excellent

LR(O) LR(O)

same same

.1. .1.

the the

after after

relatively relatively

tables. tables.

the the

generation generation

of of

this this

as as

may may

time. time.

used used

the the

largest largest

this this

only only

of of

small small

the the

since since

They They

compiled compiled

on on grammar grammar

the the

the the

fast fast

the the

LeLisp LeLisp

machine machine

table table

from from

in in

paper. paper.

different different

the the

lazy lazy

(Lisp) (Lisp)

seem seem

tables tables

a a

parse parse

difference: difference:

parser; parser;

by by

benefits benefits

first first

C; C;

The The

a a

as as

and and

and and

), ),

SUN SUN

(the (the

show show

we we

choice choice

modification. modification.

part part

142 142

small small

influencing influencing

while while

one, one,

the the

We We

IPG IPG

high high

time time

a a

compiler compiler

the the

one. one.

environment environment

parse parse

expect expect

the the

and and

We We

table table

parser parser

shows shows

difficult difficult

parser parser

code code

lines. lines.

of of

3/60 3/60

parsers parsers

length length

the the

added added

which which

generated generated

of of

parsers parsers

is is

construction construction

for for

parsing parsing

than than

generate generate

the the

kept kept

for for

This This

is is

times times following: following:

Yacc Yacc

lazy lazy

negligible. negligible.

by by

under under

grammars grammars

to to

that that

interactive interactive

The The

will will

parse parse

almost almost

an an

'Complice' 'Complice'

and and

rather rather

Yacc, Yacc,

the the

generated generated

problem, problem,

the the

in in

the the

the the

difference difference

Other Other

and and

constructed constructed

interactive interactive

some some

uses uses

of of

the the

syntax syntax

tum tum

mainly mainly

measurements, measurements,

parsers parsers

in in

low low

complexity, complexity,

rest rest

complexity complexity

parsers parsers

C-compiler. C-compiler.

table table

both both

than than

zero. zero.

is is

incremental incremental

which which

input, input,

and and

LALR(l) LALR(l)

is is

of of

that that

time time workload workload

experiments experiments

that that

Only Only

language language

we we

of of

had had

an an

by by

PG PG

be be

the the

deleted deleted

in in

[Cha86]. [Cha86].

to: to:

modification modification

is is

The The

are are

SDF SDF

were were

which which

PG PG

the the

easy easy

PG PG

was was

Lisp. Lisp.

language language

used used

a a

to to

code. code.

and and

not not

the the

of of

parse parse

namely namely

much much

be be

parsers parsers

lazy lazy

generates generates

and and

PG PG

was was

tables tables

used used

(no (no

derivative derivative

as as

able able

parser parser

the the

IPG IPG

the the

definition definition

a a

from from

first first

explains explains

generated generated

rule rule

and and

IPG, IPG,

showed showed

large large

LeLisp LeLisp

parser parser

tree tree

modified modified

larger larger

input input

swapping). swapping).

algorithms algorithms

for for

to to

are are

definition definition

four four

and and

parse parse

are are

within within

times times

in in

IPG IPG

genera­

present present

but but

but but

regen­

LR(O) LR(O)

as as

larger larger

order order

of of

used used

gen­ than than

gen­

why why

SDF SDF

of of

sys­

gar­

that that

the the

ran ran

did did

the the

for for

all all

of of

of of

in in

a a a a

dix dix

be be

SDF SDF

Lang, Lang,

viewpoint, viewpoint,

we we

incremental incremental *We *We

have have 7. 7.

solution solution

test test

used used

ing ing

for for

tion tion

parser parser

context-free context-free the the

We We

centage centage

implementation implementation

Algorithm: Algorithm:

Input: Input: Description: Description:

DECR-REFCOUNI': DECR-REFCOUNI':

RE-EXPAND: RE-EXPAND:

Algorithm: Algorithm:

Input: Input:

Description: Description:

expressed expressed

PERFORMANCE PERFORMANCE

PG PG

did did

B, B,

LALR(l) LALR(l)

performance. performance.

ltemsets. ltemsets.

itemset' itemset'

grammar grammar

have have

made made

used used

we we

is is

is is

been been

Both Both

The The

to to

A A

A A

generator, generator,

and and

not not

an an

its its

improved improved

of of

for for

a a

set set

give give

set set

compared compared

more more

LR(l) LR(l)

reasonably reasonably

introduction introduction

by by

appropriate here, here, appropriate

dirty dirty

we we

have have

IPG IPG

PG PG

to to

this this

version version

of of

of of

parsing parsing DECR-REFCOUNI'(itemset): DECR-REFCOUNI'(itemset):

RE-EXPAND(itemset): RE-EXPAND(itemset):

The The

itemset itemset

in in

parser parser

had had

All All

both both

us us

which which

expect expect

Expand Expand

efficient efficient

items items

items items

the the

have have

sets sets

SDF SDF

and and

if if

EXPAND(itemset) EXPAND(itemset)

for for itemset.refcount itemset.refcount

od od

of of

old-transitions old-transitions

problem problem

access access

would would

version version

reference reference

reference reference

to to

but but

itemset.refcount itemset.refcount

sharing sharing

an an

garbage garbage

ltemsets ltemsets

if if

DECR-REFCOUNI'(itemset') DECR-REFCOUNI'(itemset')

Vitemset' Vitemset'

described described

the the

generator generator

of of

is is

be be

algorithm*. algorithm*.

IPG IPG

itemset itemset

itemset itemset

itemset itemset

itself itself

to to

example example

style style

sized sized

itemset.type itemset.type

Earley's Earley's

as as

expanded expanded

a a

items items

AND AND

of of

LR(l), LR(l),

od od

for for

to to

be be

not not

efficiency efficiency

dirty dirty

of of

of of

the the

would would

generate generate

dirty dirty

of of

a a

expressed expressed

parse parse

to to

DECR-REFCOUNI'(itemset') DECR-REFCOUNI'(itemset')

counts counts

Vitemset' Vitemset'

count count

grammar. grammar.

be be

the the

good good

collection collection

Lisp Lisp

because because

whose whose

with with

is is referred

:= :=

becomes becomes

EFFICIENCY EFFICIENCY

modifications modifications

be be

set set

[(symbol [(symbol

Yacc Yacc

in in

of of

since since

algorithm algorithm

a a

ltemsets ltemsets

:= :=

sets sets

grammar grammar

trees. trees.

be be

programming programming

acceptable acceptable

:= :=

in in

fair fair

section section

of of

an an

As As

type type

implementation implementation

of of

itemset.transitions itemset.transitions

* *

of of

of of

= =

parse parse

to to

ref ref

the the

itemset.refcount itemset.refcount

items items

of of

[Joh79]. [Joh79].

SDF SDF

itemset itemset

initial initial

these these

we we

0 0

the the

both both

match, match,

is is

[(symbol [(symbol

the the

use use

count count

to to

'dirty'. 'dirty'.

The The

items items

cannot cannot

then then

same same

itemsel) itemsel)

decreased decreased

purely purely

- { { -

wanted wanted

4 4

high. high.

set set

of of

to to

definition definition

lazy lazy

tables tables

a a

systems systems

are are

(which (which

then then

fact fact

itemset} itemset}

in in

than than

to to

the the

conventional conventional

have have

of of

field field

is is

we we

and and

way way

A A

the the

yet yet

PG PG

and and

coincidental. coincidental.

the the

decreased. decreased.

itemsel) itemsel)

items items

syntax syntax

that that

Tomita Tomita

comparison comparison

have have

(or (or

to to

e e

reference reference

has has

better better

as as

of of

by by

only only

we we

and and

handle handle

routines routines

recognize recognize

incremental incremental

old-transitions] old-transitions]

test test

and and

-

it it

graphs graphs

an an

the the

it it

one one

to to

will will

not not

l l

did did

also happens also happens

refers refers

IPG. IPG.

formalism formalism definition

grammars grammars

all all

initial initial

an an

be be

generation generation

e e

algorithm, algorithm,

after after

in in

mark-and-sweep mark-and-sweep

included included

circular circular

call call

When When

itemset.transitions] itemset.transitions]

decreased decreased

algorithms algorithms

idea idea

counting counting

his his

are are

of of

of of

The The

It It

to, to,

the the

set set

book book

the the

'PG'). 'PG').

only only

IPG IPG

itemsets) itemsets)

parser parser

of of

quite quite

have have

same same

it it

SDF SDF

of of

accepted accepted

expansion. expansion.

the the

references references

[fom85], [fom85],

such such

do do

performance, performance,

becomes becomes

and and

means means

to to

with with

items, items,

more more

by by

to to

small, small,

size size

on on

We We

definition definition

generator generator

be be

class class

a a

be be

one. one.

a a

Earley's Earley's

the the

that that

quick quick

the the

of of

or or

also also

comparison. comparison.

only only

garbage garbage

decreased decreased

that that

and, and,

by by

SDF. SDF.

of of

we we

zero, zero,

the the

less less

same same

language language

properly. properly.

are are

Yacc. Yacc.

do do

compared compared

after after

context-free context-free

the the

the the

and and

will will

IPG IPG

of of

test test

but but

affects affects

parsing parsing

The The

inteipreted inteipreted

itemset itemset

grammar grammar

grammar grammar

a a

SDF SDF

collector collector

reference reference

mediocre mediocre

suggestion suggestion

grammar. grammar.

a a

not not

The The

with with

as as

reason reason

much much

in in

From From

well. well.

is is

A A

all all

show show

IPG IPG

which which

test test

algorithm algorithm

is is

that that

given given

straightforward straightforward

grammars. grammars.

routines routines

removed removed

of of

when when

and and

inferior inferior

count count

of of

grammar grammar

implementa­

for for

by by

a a

and and

of of

SDF SDF

them. them.

B. B.

theoretical theoretical

grammars grammars

input, input,

in in

Tomita's Tomita's

choosing choosing

the the

PG PG

the the

of of

appen­

would would

has has

of of

pars­

from from

non­

each each

with with

per­

Our Our

the the

the the

we we

As As

17 17 to to 19 YACC: :::t1 : ll: ll~ ll~

CPU time (in seconds)

PG:

CPU time (in seconds)

IPG:

CPU time (in seconds)

exp.sdf Exam.sdf SDF.sdf ASF.sdf (37 tokens) (166 tokens) (342 tokens) (475 tokens) Fig. 7.1: Results of efficiency measurements on Yacc, PG and IPG. conventional LR(O) parser generator. As is shown by the measurements in section 7, IPG is an efficient parser generator suitable for use in interactive language definition systems. Future work: related to IPG will include: • Simultaneous editing of language definitions and programs As has been explained in the introduction, we currently have an operational prototype of a univer­ sal syntax-directed editor parametrized with a syntax definition written in SDF. It is our aim to allow simultaneous editing of both this syntax definition as well as the program/specification writ­ ten in the language defined by it. • Syntax-directed editing of programs/specifications defining their own syntax. An extreme case of simultaneously editing and using syntax definitions occurs when a language can modify its own syntax. In this case, modification and use of the syntax occur in the same tex­ tual object to be edited. Limited forms of user-defined syntax appear in various disguises such as operator declarations, macro's and user-defined function notation. Oearly, the modification capa­ bility of IPG can be used to implement these changes in syntax. • MO

[Hor88] [Hor88]

[Cha86] [Cha86] [Lan74] [Lan74]

[Joh79] [Joh79]

[Tom85] [Tom85]

[Ear70] [Ear70]

[Rek87] [Rek87] [ASU86] [ASU86]

[HKR87b] [HKR87b]

[Log88] [Log88]

[HKR87a] [HKR87a]

[Hen87] [Hen87]

[HK88] [HK88]

[FGJM85] [FGJM85]

[Voi86] [Voi86]

more more

[San82] [San82] REFERENCES REFERENCES

grammars grammars

of of modification modification LR(O) LR(O)

LR LR

While While

incremental incremental

POSTSCRIPT POSTSCRIPT

20 20

some some

parsers parsers

efficient efficient

As As Horspool's Horspool's

we we

tables tables

believe, believe,

loss loss

'' ''

a a

in in

were were

[Hor88]. [Hor88].

consequence, consequence,

generation generation

any any

(1986). (1986).

J. J.

turns turns

S.C. S.C.

B. B.

2B, 2B,

pp. pp.

Proceedings Proceedings

Victoria, Victoria,

J. J. Loeckx, Loeckx, R.N. R.N.

Conference Conference

M. M.

(1987). (1987).

J. J.

Netherlands, Netherlands,

in in

Report Report

Mathematics Mathematics J. J.

M.H. M.H.

definition definition

Addison-Wesley Addison-Wesley J. J.

A.V. A.V.

Computer Computer

CS-R8761, CS-R8761,

P.R.H. P.R.H.

K. K.

Science Science

J. J.

F. F.

66 66

algebraic algebraic ming ming

Languages, Languages,

D. D.

Conference Conference

LALR(l) LALR(l)

as as

Earley, Earley,

Heering, Heering,

Rekers, Rekers,

Chailloux, Chailloux,

Heering, Heering,

parsing parsing

finishing finishing

Lang, Lang,

Voisin, Voisin,

Heering Heering

Futatsugi, Futatsugi, Sandberg, Sandberg,

Tomita, Tomita,

in in

94-102 94-102

is is

Bell Bell

way). way).

point point

look-ahead look-ahead

Johnson, Johnson,

Horspool, Horspool,

out out

Languages, Languages,

Logger, Logger,

Conference Conference

not not

Aho, Aho,

CS-R8749, CS-R8749,

Hendriks, Hendriks,

As As

of of

Laboratories Laboratories

Lecture Lecture

to to

Victoria, Victoria,

"Deterministic "Deterministic

satisfactory. satisfactory.

specification," specification,"

"An "An

of of

Science, Science,

efficiency efficiency

and and

''A ''A

Computer Computer

of of

parsers. parsers.

''CIGALE: ''CIGALE:

(1970). (1970).

his his

P. P.

be be

Efficient Efficient

Centre Centre

P. P.

ACM, ACM,

this this and and

his his

R. R.

Record Record

Proceedings Proceedings

departure departure

of of

SION, SION,

and and

et et

J.A. J.A.

"LITHE: "LITHE:

Klint, Klint,

LALR(l) LALR(l)

"An "An

problematic. problematic.

parser parser

overall overall

"YACC: "YACC:

its its

Klint, Klint,

efficient efficient

system system

the the

Sethi, Sethi,

"Incremental "Incremental

al., al.,

paper, paper,

Notes Notes

sets sets

(1986). (1986).

P. P.

Computer Computer

ACM, ACM,

Record Record

Goguen, Goguen,

''Type-checking ''Type-checking

B.C., B.C.,

compilation compilation

for for

Centre Centre

Albuquerque, Albuquerque,

Amsterdam Amsterdam

integrated integrated

We We

Second Second

Le_Lisp, Le_Lisp,

Amsterdam Amsterdam

parsing parsing

and and

Klint, Klint,

of of

for for

Programming Programming

generator generator

(1979). (1979).

and and

have have

goals goals

Mathematics Mathematics

is is

in in

a a

and and

R.N. R.N.

has has

opted opted

the the

A A

context-free context-free

Canada Canada

parse parse

New New

J. J.

non-LR(O) non-LR(O)

yet yet

tool tool

a a

techniques techniques

pp. pp.

Computer Computer

for for

of of

J. J.

language language

of of

Rekers, Rekers,

conventional conventional

J.-P. J.-P.

"Towards "Towards

a a

Science, Science,

Ninth Ninth

are are

Colloquium Colloquium

J.D. J.D.

to to

for for

the the

another another

Rekers, Rekers,

Horspool Horspool

21-38 21-38

Mathematics Mathematics

Version Version

for for

less less

Orleans Orleans

Computing Computing

for for

text text

generation generation

tables. tables.

to to

be be

(1988). (1988).

very very

for for

natural natural

(1988). (1988).

Twelfth Twelfth

(1987). (1987).

Jouannaud, Jouannaud,

New New

interactive interactive

a a

lillman, lillman,

efficient efficient

Prolog," Prolog,"

Annual Annual

arid arid

"Principles "Principles

more more

taken taken

finitely finitely

1, 1,

and and

mini-ML: mini-ML:

languages languages

combining combining

in in

Science Science

parsing parsing

similar similar

for for

Amsterdam Amsterdam

compiler-compiler," compiler-compiler,"

"Incremental "Incremental

(1985). (1985).

pp. pp.

15.2, 15.2,

Mexico Mexico

This This

Conference Conference

syntax-directed syntax-directed

sent sent

shorter shorter

Computer Computer

languages languages

LR LR efficient efficient

Annual Annual

on on

efficient efficient

of of

61-86, 61-86,

into into

Science Science

ACM ACM

Compilers. Compilers.

ambiguous ambiguous

and and

incremental incremental

us us

Automata, Automata,

Report Report

is is

parser parser

to to

le le

14, 14,

and and

LR LR

algorithm," algorithm,"

grammar grammar

an an

(1982). (1982).

(but (but

his his

more more

ours, ours,

manuel manuel

oflazy oflazy

account, account,

Computer Computer

a a

algebraic algebraic

Springer-Verlag, Springer-Verlag,

ACM ACM

parsers," parsers,"

J. J.

North-Holland North-Holland

flexible flexible

(1988). (1988).

Symposium Symposium

non-detenninistic non-detenninistic

LR(O) LR(O)

experience experience

Science, Science,

recent recent

, ,

rather rather

Proceedings Proceedings

Meseguer, Meseguer,

without without

in in

Kluwer Kluwer

CS-R88 CS-R88

generation generation

we we

difficult difficult

Symposium Symposium

construction construction

and and

editor," editor,"

context-free context-free

the the

de de

Principles, Principles,

Languages Languages

briefly briefly

table table

table table

whose whose

syntax syntax

Communications Communications

report report

than than

Science, Science,

specifications: specifications:

reference reference

incremental incremental

Report Report

Netherlands, Netherlands,

Amsterdam Amsterdam

restricting restricting

in in

Academic Academic

l l

4, 4,

generation generation

on on

with with

generation generation

than than

"Principles "Principles

a a

summarize summarize

UNIX UNIX

Report Report

of of

Centre Centre

(1986). (1986).

on on

parallel parallel

of of

and and

incremental incremental

Principles Principles

Saarbriicken Saarbriicken

DCS-

on on

lexical lexical

Amsterdam Amsterdam

Computing Computing

user-defined user-defined

incremental incremental

and and

incremental incremental

Techniques Techniques

grammars,'' grammars,''

and and

parsers," parsers,"

classes," classes,"

, ,

Programmer's Programmer's

Principles Principles

INRIA, INRIA,

the the

Publishers Publishers

CS-R8820, CS-R8820,

program program

(1987). (1987).

for for

Programming, Programming,

expression expression

79-IR, 79-IR,

one one

phase, phase,

phase phase

SION, SION,

a a

class class

scanners," scanners,"

his his

of of

of of

Mathematics Mathematics

simple simple

of of

and and

the the

OBJ2," OBJ2,"

pp. pp.

generation generation

approach. approach.

pp. pp.

(1974). (1974).

Science Science

(1987). (1987).

Rocquencourt Rocquencourt

at at

Programming Programming

University University

generation generation

but but

of of

generation generation

and and

syntax syntax

generation," generation,"

pp. pp.

of of

ACM ACM

he he

Amsterdam Amsterdam

( (

the the

255-269 255-269

142-145 142-145

1985). 1985).

Centre Centre

acceptable acceptable

Program­

parsing,'' parsing,''

generates generates

considers considers

language language

69-86 69-86

Manual Manual

expense expense

Tools Tools

pp. pp.

Report Report

13(2), 13(2),

ed. ed.

in in

in in

and and

52-

and and

the the

for for

an an

of of

in in

in in

J. J.

of of

of of

in in , , 21

APPENDIX A. GOTO is always called with a complete set of items LR-PARSE calls GOTO for a certain set of items which we will prove to be complete. We do this for LR­ p ARSE as given below, which is just another representation of the algorithm given in section 3.1. (1) LR-PARSE(start-state, sentence): (2) parser := new(LRparser) (3) push(start-state, parser.stack) (4) symbol, sentence:= head(sentence), tail(sentence) (5) while true do (6) actions :=ACTION(top(parser.stack), symbol) (7) if 3action e actions then (8) if action =(shift state') then (9) push(state', parser.stack) (10) symbol, sentence:= head(sentence), tail(sentence) (11) elseif action= (reduce A ::= fi) then (12) for 1 · · · length((i) do pop(parser.stack) od (13) state" := GOTO(top(parser.stack), A) (14) push(state", parser.stack) (15) elseif action =(accept) then (16) return true (17) fi (18) else (19) return false (20) fi (21) od We first need to prove a loop invariant for the algorithm that states that only the set of items on top of the stack can be initial, all others must be complete:

INV: "ifi[l ~i~ length(parser.stack)- l ~ parser.stack [i ].type =complete] INV is true before entering the main loop of the algorithm, because the stack then only consists of one ele­ ment. The execution of the loop itself does not alter INV. After executing line (6), where ACTION was called for the state on top of the stack, an even stronger condition holds: if ACTION is called on an initial set of items, it expands it to a complete set of items. So after the call of ACTION we have:

S-INV: "ifi[l ~i~length(parser.stack) ~ parser.stack[i].type =complete] ACTION returns zero or more actions, of which one is chosen. The following cases are possible: • actions =0 The execution of line (19) does not alter the stack. We bad S-INV, so we certainly have INV. • action = (shift state') In line (9) state' is pushed on the stack. The length of the stack is thus increased by one, but because we had S-INV, we still have INV. • action =(reduce A ::= fi) In line (12) the stack is reduced by length((i) elements. Because we bad S-INV and the length of the stack gets smaller or remains equal by this operation, S-INV still holds after line (12). In line (14) state" is pushed on the stack, which increases the length of the stack by one. Because we bad S-INV, we still have INV. • action =(accept) The execution of line (16) does not alter the stack We had S-INV, so we certainly have INV. INV is indeed a loop invariant of LR-PARSE. GOTO is called for the state on top of the parse stack in line (13); after line (12) we had S-INV, so we have also top(parser.stack).type =complete. " This proof can be extended to the case of PAR-PARSE in a straightforward way. Here, the invariant to prove would be that INV holds for all parsers in this-sweep u next-sweep at each moment during the parsing. We will not elaborate this proof.

begin begin

module module

13 13

part, part,

tance, tance,

context-free context-free

SDF SDF

Formalism' Formalism'

APPENDIX APPENDIX

22 22

~A ~A

context-free context-free

followed followed

is is

lexical lexical

In In

because because

is is

the the

the the

equivalent equivalent

SDF SDF

functions functions

sorts sorts

functions functions

layout layout

sorts sorts

language language

and and

context-free context-free

syntax. syntax.

B. B.

by by

we we

LEXICAL-FUNCTIONS, LEXICAL-FUNCTIONS,

FUNCTIONS, FUNCTIONS,

"end" "end"

CONTEXT-FREE-SYNTAX, CONTEXT-FREE-SYNTAX,

SDF-DEFINITION, SDF-DEFINITION,

"begin" "begin"

"module" "module"

"--" "--"

ORD-CHAR ORD-CHAR

"-" "-"

ORD-CHAR ORD-CHAR

C-CHAR C-CHAR

"--" "--" "\"" "\""

C-CHAR C-CHAR

"\n" "\n"

"-\n" "-\n"

-

"\"" "\"" LETTER LETTER

"[" "["

WHITE-SPACE, WHITE-SPACE,

"\ "\

L-CHAR, L-CHAR, ORD-CHAR, ORD-CHAR,

"+" "+"

"*" "*"

LETTER, LETTER,

[ [

[\-\[\]] [\-\[\]]

syntax syntax

[0-9A-Za-z [0-9A-Za-z

[a-zA-Z0-9\-_] [a-zA-Z0-9\-_]

[a-zA-Z] [a-zA-Z]

is is

The The

the the

\t\n\r\f] \t\n\r\f]

\" \"

did did

[\n\-] [\n\-]

described described

to to

CONTEXT-FREE-SYNTAX CONTEXT-FREE-SYNTAX

LEXICAL-SYNTAX LEXICAL-SYNTAX

CHAR-RANGE* CHAR-RANGE*

-

in in

For For

SDF SDF

declaration declaration

COM-CHAR* COM-CHAR*

syntax syntax

L-CHAR* L-CHAR*

not not

-

a a

ID ID

which which

[\n\-] [\n\-]

BNF BNF

"-" "-"

ID-TAIL* ID-TAIL*

The The

[] []

syntax syntax

the the

use use

LITERAL, LITERAL,

ID-TAIL, ID-TAIL,

definition definition

ID ID

in in

C-CHAR, C-CHAR,

measurements measurements

FUNCTION-DEF, FUNCTION-DEF,

syntax syntax

the the

C-CHAR C-CHAR

!J$%&' !J$%&'

grammar grammar

SDF SDF

[HK.86]. [HK.86].

--

section section

"\"" "\""

COMMENT COMMENT

lexical lexical

of of

COM-END COM-END

all all

definition definition

LEXICAL-SYNTAX, LEXICAL-SYNTAX,

the the

"]" "]"

rule rule

of of

COM-CHAR, COM-CHAR,

ID, ID,

()*+,./:;<=>?@\~ ()*+,./:;<=>?@\~

CHAR-RANGE, CHAR-RANGE,

SDF SDF definitions definitions

syntax syntax

LEXICAL-FUNCTION-DEF, LEXICAL-FUNCTION-DEF,

the the

chars chars

scanner scanner

An An

A A

PRIORITIES, PRIORITIES,

ITERATOR, ITERATOR,

::= ::=

non-terminals non-terminals

described described

SDF SDF

13. 13.

rules rules

CF-ELEM, CF-ELEM,

except except

in in

definition definition

of of

for for

COM-END COM-END

the the

in in

SDF SDF

in in

IPG IPG

measurements. measurements.

CHAR-CLASS, CHAR-CLASS,

the the

section section

SORTS-DECL, SORTS-DECL,

control control

used used

PRIO-DEF, PRIO-DEF,

are are

consists consists

'functions' 'functions'

ATTRIBUTES, ATTRIBUTES,

'{I}-] '{I}-]

written. written.

are are

7 7

the the

declared declared

of of

characters, characters,

LEX-ELEM, LEX-ELEM,

two two

declaration declaration

lexical lexical

ABBREV-F-LIST, ABBREV-F-LIST,

SDF SDF

-> ->

SORT, SORT,

-> ->

-> ->

-> ->

-> ->

-> ->

-> -> -> ->

-> ->

-> ->

-> ->

-> ->

-> ->

-> ->

-> ->

-> ->

-> ->

-> ->

->ORD-CHAR ->ORD-CHAR

-> ->

-> ->

-> ->

-> ->

parts, parts,

ATTRIBUTE ATTRIBUTE

first first

SDF-DEFINITION SDF-DEFINITION

COM-END COM-END

COMMENT COMMENT

COM-END COM-END

COM-END COM-END

WHITE-SPACE WHITE-SPACE

COM-CHAR COM-CHAR

LITERAL LITERAL

COM-CHAR COM-CHAR stands stands

L-CHAR L-CHAR

L-CHAR L-CHAR

C-CHAR C-CHAR

CHAR-CLASS CHAR-CLASS

C-CHAR C-CHAR

CHAR-RANGE CHAR-RANGE

ORD-CHAR ORD-CHAR CHAR-RANGE CHAR-RANGE

LETTER LETTER

ID ID

ID-TAIL ID-TAIL

ITERATOR ITERATOR

ITERATOR ITERATOR

syntax syntax

the the

LAYOUT, LAYOUT,

in in

part. part.

the the

", ",

for for

lexical lexical

part part

'Syntax 'Syntax

'sorts' 'sorts'

-, -,

An An

is is

ABBREV-F-DEF, ABBREV-F-DEF,

syntax syntax

[, [,

SDF SDF

of of

declaration declaration

no no

] ] Definition Definition

function function

and\ and\

and and

impor­ the the 23

"lexical" "syntax" SORTS-DECL LAYOUT LEXICAL-FUNCTIONS -> LEXICAL-SYNTAX empty -- -> LEXICAL-SYNTAX

"sorts" {SORT","}+ -> SORTS-DECL empty -- -> SORTS-DECL ID -> SORT

"layout" {SORT","}+ -> LAYOUT -- empty -> LAYOUT

"functions" LEXICAL-FUNCTION-DEF+ -> LEXICAL-FUNCTIONS

LEX-ELEM+ "->" SORT -> LEXICAL-FUNCTION-DEF

SORT -> LEX-ELEM SORT ITERATOR -> LEX-ELEM LITERAL -> LEX-ELEM CHAR-CLASS -> LEX-ELEM 11 11 - CHAR-CLASS -> LEX-ELEM "context-free" "syntax" SORTS-DECL PRIORITIES FUNCTIONS -> CONTEXT-FREE-SYNTAX

"priorities" {PRIO-DEF ","}+ -> PRIORITIES -- empty -- -> PRIORITIES {ABBREV-F-LIST ">"}+ -> PRIO-DEF {ABBREV-F-LIST "<"}+ -> PRIO-DEF ABBREV-F-DEF -> ABBREV-F-LIST "(" {ABBREV-F-DEF ","}+ ")" -> ABBREV-F-LIST CF-ELEM+ -> ABBREV-F-DEF CF-ELEM* "->" SORT -> ABBREV-F-DEF

"functions" FUNCTION-DEF+ -> FUNCTIONS

CF-ELEM* "->" SORT ATTRIBUTES -> FUNCTION-DEF

SORT -> CF-ELEM LITERAL -> CF-ELEM SORT ITERATOR -> CF-ELEM "{" SORT LITERAL "}" ITERATOR -> CF-ELEM

"{"{ATTRIBUTE","}+"}" -> ATTRIBUTES -- empty -> ATTRIBUTES "par" -> ATTRIBUTE "assoc" -> ATTRIBUTE "left-assoc" -> ATTRIBUTE "right-assoc" -> ATTRIBUTE end SDF