, ,
Computer Science/Department Science/Department Computer
Centrum Centrum
Centre Centre
of of
Software Software
for for
Mathematics Mathematics
voor voor
Technology Technology
Wiskunde Wiskunde
Incremental Incremental
Centrum Centrum
J. J.
and and
Heering, Heering,
Report Report
ypor ypor
Computer Computer
en en
generation generation
Wisl~unde Wisl~unde
Biblk>tlleek Biblk>tlleek
Am~tel>dam Am~tel>dam
P. P.
CS-R8822 CS-R8822
lnformatica lnformatica
Klint, Klint,
en en
J.G. J.G.
Science Science
of of
lnformatk:a lnformatk:a
parsers parsers
Rekers Rekers May May The Centre for Mathematics and Computer Science is a research institute of the Stichting Mathematisch Centrum, which was founded on February 11, 1946, as a nonprofit institution aim ing at the promotion of mathematics, computer science, and their applications. It is sponsored by the Dutch Government through the Netherlands Organization for the Advancement of Pure Research (Z.W.0.).
q\ '
Copyright (t::Stichting Mathematisch Centrum, Amsterdam 1
Incremental Generation of Parsers
J. Heering Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AS Amsterdam, The Netherlands
P. Klint Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AS Amsterdam, The Netherlands and Programming Research Group, University of Amsterdam P.O. BOX 41882, 1009 DB Amsterdam, The Netherlands
J. Rekers Department of Software Technology, Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam, The Netherlands
A parser checks whether a text is a sentence in a language. Therefore, the parser is provided with the grammar of the language, and it usually generates a structure (parse tree) that represents the text according to that grammar. Most present-day parsers are not directly driven by the grammar but by a 'parse table', which is generated by a parse table generator. A table based parser wolks more efficiently than a grammar based parser does, and provided that the parser is used often enough, the cost of gen erating the parse table is outweighed by the gain in parsing efficiency. However, we need a parsing system for an environment where language definitions are developed
~(and modified) interactively; here we do not have enough time to entirely generate the parse table be fore parsing starts, and because of the frequent modifications of the grammar, it is not obvious that the extra efficiency of a table based parser makes up for the cost of generating the parse table. We propose a lazy and incremental parser generation technique: (1) The parse table is generated during parsing, but only those parts of it are generated that are really needed to parse the input sen tences at hand. (2) If the grammar is modified, the existing parse table is modified incrementally, rather than regenerated completely. Using this technique we have eliminated the disadvantages of a parse table generator, while we retained the advantages of a table based parser. We use a parsing algorithm based on parallel LA parsing which is capable of handling arbitrary context-free grammars. We present all required algorithms and give measurements comparing their performance with that of conventional techniques.
Keywords & Phrases: lazy and incremental parser generation, program generation, incomplete program, LA parser generator, parallel LA parsing, parsing of context-free languages. 1987 CR Categories: 0.1.2, 0.3.1, 0.3.4, F.4.2. 1985 Mathematics Subject Classification: G8N20. Note: Partial support received from the European Communities under ESPRIT project 348 (Generation of Interactive Programming Environments - GIPE).
Report C$>-R8822 Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam,The Netherlands
efficiency efficiency
tion tion
ing ing
environment environment
parser parser
tor tor
extend extend
For For
scanner scanner
algorithm. algorithm.
dynamic dynamic
We We
The The
many many
generate generate
1. 1.
2 2
• •
• •
• •
• •
• •
• •
INTRODUCTION INTRODUCTION
[Log88] [Log88]
The The
The The
The The
is is
The The
interactive interactive
the the
already already
small. small.
phase, phase,
sentences sentences
re-used. re-used.
editing editing out out
class class
the the
a a
module module There There
module module
-
(LITHE (LITHE
-
- sufficiently sufficiently
grammar, grammar,
When When
component. component.
the the
describe describe
3 3
design design
small small
In In
no no
The The
It It
to to
that that
ing ing
In In
times. times.
are are
we we
it it
parser parser
parser. parser.
that that
general general
parsing parsing
parser parser
efficiency efficiency
parser parser
may may
the the
section section
generator generator
general general
once once
efficient efficient
of of
applications applications
delay delay
corresponding corresponding a
present present
measurements, measurements,
generated generated
is is
a a
of of
but but
time time
We We
part part
of of
generated generated
context-free context-free
in in
of of
with with
defines defines
language language
parser; parser;
of of
[San82], [San82],
a a
Hence, Hence,
for for
the the
happen happen
a a
If If
do do
have have
a a
these these
the the
again again
comparison comparison
language language
this this
generator generator
trend trend
lazy lazy
parser parser
time time is is
often. often.
extend extend
principles principles
needed needed
algorithm algorithm
of of
this this
(completely) (completely)
the the
2 2
parser parser
only only
The The
not not
the the
parsers. parsers.
generated generated
parser parser
the the
of of
we we
the the
environment environment
been been
ISG ISG
the the
languages, languages,
ASF/SDF ASF/SDF
is is
its its
due due
and and
on on
into into
the the
towards towards
use use
the the
that that
(visible) (visible)
OBJ OBJ
sketched sketched
parser. parser.
is is
response response
parsing parsing
a a
discuss discuss
grammar, grammar,
Three Three
generators generators
indeed indeed
this this
effort effort
to to
the the
generator. generator.
own own
small small
definition definition
effort effort
being being
is is
generated, generated,
grammars, grammars,
parsing parsing
to to
incremental incremental
is is
is is
the the
an an
some some
and and
parse parse
is is
underlying underlying
modification modification
There There
[FGJM85], [FGJM85],
with with
incremental. incremental.
generated generated
fly fly
described. described.
algorithm algorithm
the the
incremental incremental
capable capable
in in
whole whole
observations observations
syntax, syntax,
programming/specification programming/specification
spent spent
section section
new new
Parts Parts
part part
the the
related related
algorithm algorithm
designed, designed,
from from
put put
specification specification
syntax syntax
above: above:
parser parser
a a
parts parts
time time
it it
conventional conventional
is is
the the
process process
a a
situations situations are
lazy lazy
case, case,
is is
is is
parametrized parametrized
systems. systems.
of of
in in
smoother smoother
parser parser
on on
the the
grammar, grammar,
of of
and and
input input
the the
of of usually usually
of of
of of
of of
generating generating
the the
parser parser and and
contains contains 8
by by
into into
algorithms algorithms
our our
generation generation
the the
of of
fashion fashion
such such
Cigale Cigale
parser parser
The The
a a
handling handling
great great
SDF SDF
in in
the the
the the
that that
parser parser
used used
its its
need need
itself itself
A A
grammar grammar
sophisticated, sophisticated,
the the
is is
method, method,
must must
parser parser
each each
can can
the the
a a
change change
grammar grammar
parts parts by
grammar grammar
language language
not not
editor editor
lazy lazy
combination combination
response response
generator generator
can can
based based
definition. definition.
importance importance
imported imported
techniques, techniques,
parser, parser,
work work
will will
[Voi86], [Voi86],
by by
be be
while while
remains remains
from from
however, however,
generation generation
be be
only only
import import
with with
some some
them them
incrementally incrementally
that that
arbitrary arbitrary
parser parser
and and
made made
phase phase
us. us.
be be
see see
is is
generated, generated,
is is
in in
on on
is is
parsing parsing
the the
the the
determined determined
modified. modified.
are are
a a
rather rather
are are
show show
is is
acceptable, acceptable,
as as
Section Section
mentioned mentioned
the the
is is
is is
saved saved
concluding concluding
[HKR87b]. [HKR87b].
IPG, IPG,
the the
syntax syntax
unaffected, unaffected,
ASF/SDF ASF/SDF
here: here:
possibly possibly
of of
module. module.
and and
not not
parser parser
generation generation
efficient efficient
achieved achieved
not not
grammar. grammar.
by by unaffected
where where
not not
languages languages
to to
grammar grammar
the the
ISG/IPG ISG/IPG
context-free context-free
algorithm. algorithm.
a a
assumption assumption
how how
yet yet
than than
parsing parsing
use use
which which
input. input.
thrown thrown
on on
needed needed
module module
overllead overllead
but but
4 4
written written
generator generator
incorporate incorporate
completely completely
this this
One One
describes describes the the
our our
a a
inefficient, inefficient,
generating generating
by by
above. above.
as as
For For
even even
[Hen87,HK88]). [Hen87,HK88]).
remarks. remarks.
there there
than than
parser parser
in in
In In
is is
produces produces
algorithm algorithm
When When
is is
generation generation
conventionally conventionally a
There There
assumption assumption
the the
can can
that that
away. away.
technique technique
would would
by by
the the
extends extends
in in
efficient efficient
specially specially
[HKR87a] [HKR87a]
Finally, Finally,
used used
in in
grammars. grammars.
the the
that that
though though
introduced introduced
efficiency efficiency
is is
any any
SDF, SDF,
start start
The The
is is
allow allow
sense sense
the the
generator generator
typical typical
a a
no no
is is
modification modification
modifications modifications
wasted. wasted.
fixed fixed
This This
to to like
an an
parser parser
the the
conventional conventional
in in
of of
a a
conventional conventional
no no
immediately. immediately.
universal universal
guarantee guarantee
in in
and and
the the
section section
entirely entirely
corresponding corresponding
evolved evolved
process process parsing parsing
that that
the the
an an
the the
general general
tailored tailored
does does
generated generated
section section
separate separate
clearly clearly
a a
input input
uses uses
of of
After After
syntax syntax
exploit exploit
interactive interactive
by by
generator generator
sentences sentences
lazy/incremental lazy/incremental
lexical lexical
once once
that that
In In
generated generated
not not
the the
this this
7 7
such such
new new
ISG/IPG ISG/IPG
syntax-directed syntax-directed
sentences sentences
as as
and and
user-defined user-defined
from from
in in
towards towards
that that
5. 5.
gives gives
each each
has has
can can
parser, parser,
apply: apply: all all
of of
parser parser
parser parser
a a
lazy lazy
of of
this this
parsers parsers
case, case,
the the
scanner scanner
In In
When When
syntax-directed syntax-directed
whole. whole.
one. one.
needed needed
the the
languages languages
advantages advantages
it it
handle handle
can can
actually actually them. them.
the the
change change
change change
the the
development development
section section
grammar grammar
fact fact
will will
parser. parser.
technique technique
grammar grammar
but but
since since
as as
generation generation
generation generation
the the
be be
need need
importing importing
results results
the the
are are
by by
its its
and and
parts parts
be be
It It
also also
In In
used used
a a
lexical lexical
syntax syntax
highly highly
in in
of of
given given
pars
input input
6 6
there there
mak
large large
turns turns
each each
only only
used used
used used
sec
edi
the the
are are
the the
we we
for for
the the
by by
of of
of of
in in
is is to to 3
2. THE CHOICE OF A PARSING ALGORITHM
2.1. A comparison of algorithms We compare some existing parsing algorithms with our own algorithm from the perspective of highly dynamic applications like the ones discussed in the previous section: • LR(k) and LALR(k) algorithms [ASU86, eh. 4.7] These algorithms are controlled by a parse table that is constructed beforehand by a table genera tor. The table is constructed top-down, while the parser itself wotks bottom-up. The parser wotks in linear time. When the look-ahead k is increased, the class of recognizable languages becomes larger (but will always be limited to non-ambiguous grammars), and the table generation time increases exponentially. With conventional LR or LALR table generation algorithms it is impossi ble to update an already generated parse table incrementally, if the grammar is modified. • Recursive descent and LL(k) algorithms [ASU86, eh. 4.4] A recursive descent parser generator constructs a parsing program, whereas an LL generator con structs a parse table that is interpreted by a fixed parser. In both algorithms the parsers work top down. The class of accepted languages depends on the look-ahead k, but is always limited to non-left-recursive, non-ambiguous grammars. • Earley's general context-free parsing algorithm [Ear70] Earley's algorithm can handle all context-free grammars. It wotks by attaching to each symbol in the input a set of 'dotted rules'. A dotted rule consists of a syntax rule with a cursor (dot) in it and the position in the input where the recognition of the rule started. The set of dotted rules for sym bol n+l is computed at parse time from the set for symbol n. Earley's algorithm does not have a separate generation phase, so it adapts easily to modifications in the grammar. It is this same lack of a generation phase that makes the algorithm too inefficient for interactive purposes. • Cigale [Voi86] Cigale uses a parsing algorithm that is specially tailored to expression parsing. It builds a trie for the grammar in which production rules with the same prefix share a path. During parsing this trie is recursively traversed. A trie can easily be extended with new syntax rules and tries for different grammars can be combined just like modules. The class of grammars is only somewhat larger , than LR(O) grammars, because the parser does not use look-ahead in a general manner and cannot backtrack. • OBJ [FGJM85] OBJ uses a recursive descent parsing technique with backtracking. OBJ itself does not allow ambiguous grammars, but the backtrack parser does detect all ambiguous parses. This makes the parsing system suitable for finitely ambiguous grammars, but as mentioned in [FGJM85, p. 60] ''parsing can be expensive for complex expressions'', which makes the algorithm less suitable for large input sentences. • Tomita's pseudo-parallel parsing algorithm [Tom85,Rek87,Lan74] This is an extended LR parsing algorithm, that requires a conventional (but possibly ambiguous) LR(O) or LR(l) parse table. The parser starts as an LR parser, but when it hits a conflict in the parse table, it splits up in several LR parsers that work in parallel. Grammars are restricted to the class of finitely ambiguous context-free grammars. As it uses the same table generation phase as conventional LR algorithms, modifying the grammar is an expensive operation with Tomita's algorithm. • Incremental parser generator IPG We developed this method on the basis of Tomita's parsing algorithm, but provided the algorithm with an incremental LR(O) parse table generator. Parsing starts with an empty parse table, which is expanded by need during parsing. A change in the grammar is handled incrementally by remov ing the parts of the parse table that are affected by the change; these parts are recomputed for the piodifi.ed grammar when the parser needs them again. The parse table is constructed during pars ing, so after a certain time, depending on the input given to it, the system will become as fast as a conventionally controlled Tomita parser. Fig. 2.1 gives an overview of the following characteristics of these algorithms: capable of handling
explained explained
The The
same same efficient efficient
the the
table table
puted puted
dependent dependent
and and
parser parser
and and
The The algorithm. algorithm.
a a
information information
the the
adapts adapts
2.2. 2.2.
arbitrary arbitrary modifications modifications
4 4
specialised specialised
grammar. grammar.
Tomita's Tomita's
grammar. grammar.
latter latter the the
simplest simplest
Evolution Evolution
lacy lacy
driven driven
Fig. Fig.
In In
Another Another
in in
These These
generation generation
easily easily
table table
a a
the the
technique technique
context-free context-free
can can
in in
table table
2.3(b) 2.3(b)
separate separate
part, part,
These These
driven driven
must must
Fig. Fig.
section section
system system
general general
parser parser
parser parser
driven driven
scheme scheme
parsing parsing
grammar grammar
remove remove
to to
grammar grammar
This This
of of
frequently frequently
An An
of of
Tomita Tomita the the
IPG IPG
generator generator
Cigale Cigale
OBI OBI algorithm algorithm
Earley Earley
recursive recursive
LR(k), LR(k),
modifications modifications
2.2: 2.2:
the the
shows shows
phase. phase.
be be
parsers parsers
parser parser
parsing parsing
example example
technique technique
as as
table table
parse parse
5. 5.
is is
is is
parser parser
shown shown
(re (re
grammar grammar
grammar grammar
(a) (a)
for for
algorithm. algorithm.
parts parts
grammars grammars
generated generated
that that
controlled controlled
)computed )computed
LALR(k) LALR(k)
an an
Grammar Grammar
a a
generation generation
used used
are are
descent, descent,
table, table,
and and
(general) (general)
together together of of
algorithms algorithms
of of
incremental incremental
in in
Fig. Fig.
of of
requirement requirement
will will
more more
the the
driven driven
Fig. Fig.
(flexible). (flexible).
Fig. Fig.
the the
organisation organisation
in in
this this
and and
2.1: 2.1:
The The
for for
(powerful), (powerful),
generated generated
by by
be be
the the
same same
from from
LL(k) LL(k)
driven driven
2.2(c). 2.2(c).
2.3(a) 2.3(a)
efficient efficient
technique technique
form form
parser parser
phase. phase.
a a
a a
explained explained
Comparison Comparison
parsing parsing
a a
algorithms algorithms
grammar, grammar,
given given
grammar grammar
parse parse
the the
parse parse
table table
and and
a a
the the
parser, parser,
is is
This This
specialised specialised
is is
grammar. grammar.
parse parse
Examples Examples
powerful powerful
specialised specialised
because because
grammar. grammar.
efficient efficient
modular modular
algorithms algorithms
grammar grammar
given given
table table
table table
shown shown
driven driven
table table
is is
parser parser
++ ++
++ ++
in in
++ ++
+ +
independent independent
to to
but but
is is
Earley's Earley's
(b) (b)
section section
of of
table table
be be
the the
generation generation
that that in in
generator generator
various various
is is
in in
specialised specialised
the the
parser, parser,
on on
composition composition
presented presented
Fig. Fig.
of of
parser parser
lazy lazy
that that
An An
inefficient inefficient
Fig. Fig.
have have
is is
parsing parsing
large large
6. 6.
this this
algorithm algorithm
2.2(a), 2.2(a),
generated generated
example example
fast fast
++ ++
++ ++
have have
++ ++
++ ++
2.2(c), 2.2(c),
parsing parsing
parse parse
but but
part, part,
evolved evolved
technique technique
for for
for for
input input
phase phase
in in
parser, parser,
these these
information information
become become
the the
where where
modifiable modifiable
the the
because because
of of
table table
section section
where where
of of
algorithms. algorithms.
sentences sentences
input input
in in
grammar, grammar,
parsers parsers
is is
by by
into into
flexible flexible table table
are are
this this
its its
(c) (c)
made made
are are
++ ++
++ ++
generation generation
the the
need. need.
+ + + +
incorrect incorrect
+ +
the the
extended extended
3 3
the the
pure pure
table table
scheme scheme
for for
driven driven
LL LL
parser parser
and and
specialised specialised
should should
is is
grammars. grammars.
parser parser
scheme scheme
part part
(fast), (fast),
table table
each each
and and
table table
computed computed
and and
The The
form. form.
4 4
driven driven
grammar grammar
parser parser
all all
modular modular
due due
of of
is is
parser. parser.
is is
generator generator
the the
technique technique
LR LR
be be
with with
is is
driven driven
parse parse
parser parser
efficient efficient
++ ++
++ ++
controlled controlled
fall fall
the the
the the
+ +
of of
split split
to to
This This
possible possible
parser parser
parse parse
parser. parser.
parsing parsing
Fig. Fig.
in in
recursive recursive
a a
It It
parser. parser.
a a
only only
step step
The The
table table
modification modification
into into
consists consists
this this
uses uses
kind kind
2.2(b 2.2(b
table table
processing processing
that that
once once
parse parse
(modular). (modular).
a a
directly directly
all all
category. category.
algorithms algorithms
corrector. corrector.
the the
grammar grammar
of of
Here Here
tree tree
), ),
is is
descent descent
will will
parsing parsing
of of
where where parser parser
in in
same same
com
table table
the the
the the
the the
be be
by by
in in of of
ACTION ACTION
3.1. 3.1.
LR-PARSE: LR-PARSE:
An An
action action
3. 3.
In In
ning ning
Output: Output:
field/, field/,
Description: Description:
Input: Input:
new(T) new(T)
Algorithm: Algorithm:
[Tom85, [Tom85, for for
'reduce' 'reduce'
LR LR
this this
stacks stacks
ACTION(state, ACTION(state,
action action
sentence sentence
of of
means means
but but
trees trees
action action
section section
can can
our our
We We
'LRparser' with with 'LRparser'
'accept' 'accept'
LR LR
Conventional Conventional
in in
In In
A A
the the
for for
that that
PARSING PARSING
parallel. parallel.
~parallel ~parallel
needs needs
only only
adapted adapted
to to
'true' 'true'
section section
parser parser
start start
action, action,
the the
Rek87]. Rek87].
and and
table table
we we
set set
parser, parser,
the the
'(reduce '(reduce
that that
create create
field field
can can
A A
means means
LR-P LR-P
algorithms algorithms
The The
is is
handle handle
use use
state state
we we
and and
later later
if if
parser parser
simple simple
determines, determines,
the the
a a
grammar grammar
is is
action action handle
the the
parser parser
sentence sentence
can can
the the
we we
We We
list list
LR LR
push(e, push(e,
do do
ARSE(start-state, ARSE(start-state,
an an
means means
and and
It It
Fig. Fig.
controlled controlled
parser parser
ALGORITHMS ALGORITHMS
start-state start-state
on: on:
version version
rule)' rule)'
symbol) symbol)
that that
LR LR
parser parser
a a
original original
of of
be be
basically basically
sets sets
therefore therefore
object object
discuss discuss
not not
parse parse
to to
lazy lazy
generator generator
grammar grammar
LR LR
single single
the the
parser parser
2.3: 2.3:
( (
terminals terminals
accessed accessed
:= :=
parsing parsing
driven driven
which which
1) 1)
perform. perform.
the the
keep keep
that that
of of
has has
is is
parser. parser.
S), S),
table table
means means
head head
new(LRparser) new(LRparser)
ACTION ACTION
uses uses
algorithm algorithm
requires requires
(a) (a)
on on
of of
grammatically grammatically
whole whole
at at
with with
field field
LR LR
pop(S), pop(S),
the the
advanced advanced
by by
a a
and and
sets sets
consists consists
parser parser
recall recall
the the
symbols symbols
type type
Lazy Lazy
follow follow
most most
the the
symbol symbol
simplified simplified
by by
parsing parsing
input input
terminated terminated
its its
that that
the the
stack stack
a a
An An
of of
basis basis
input input
GOTO GOTO
sentence sentence
a a
sentence): sentence):
O.f O.f
T, T,
returns returns
parser parser
parse parse
the the
one one
arbitrary arbitrary
uses uses
current current
dynamically dynamically
visible visible
finished finished
and and
unknown unknown
the the
we we
action action
of of
read read
tree tree
and and
as as
one one
on on
of of
of of
details details
has has
algorithm, algorithm,
For For
action action
parser parser
a a
need need
top(S). top(S).
the the
a a
table table
correct, correct,
the the
the the
table, table,
generation, generation,
the the
copy(O) copy(O)
version version
(dynamically (dynamically
so so
a a
step step
stack stack
by by
sentence. sentence.
been been
can can
lists lists
state state
set set
parse parse
far far
length. length.
input input
current current
some some
parse parse
the the
of of
to to
correctly, correctly,
has has
in in
of of
be be
which which
we we
of of
can can
ordinary ordinary
recognized, recognized,
varying varying
determine determine
and and
recognizing recognizing
end-marker end-marker
'false' 'false'
possible possible
stack stack
of of
as as
to to
recognized recognized
states. states.
input input
sentence sentence
either either
low-level low-level
stack, stack,
use use
never never
(2) (2)
state state
(b) (b)
given given
current current
make make
Start-state Start-state
Tomita's Tomita's
has has
varying) varying)
the the
of of
otherwise. otherwise.
......
To To
but but
incremental incremental
number number
modifiable, modifiable,
LR LR
a a
The The
but but
of of
become become
the the
actions actions
functions functions
what what
a a
for for
'shift', 'shift',
an an
keep keep
is is
the the
symbol. symbol.
and and
the the
parsing parsing
functions. functions.
$. $.
copy copy
a a
algorithm. algorithm.
only only
the the
state state
its its
ACTION ACTION
instance instance
syntax syntax
(pseudo-)parallel (pseudo-)parallel
is is
parallel parallel
number number
to to
parser parser
of of
the the
things things
current current
syntax syntax
rather rather
the the
a a
of of
'reduce', 'reduce',
do do
states. states.
them. them.
table table
on on
grammar grammar
sentence sentence
parser parser
first. first.
head(L), head(L),
error error
object object
parser parser
Basically, Basically,
state state
with with
rule rule
top top
and and
simple, simple,
in in
For For
driven driven
LR-parser LR-parser
than than
of of
and and
rule rule
LR-PARSE LR-PARSE
input input
of of
action action
and and
[ASU86, [ASU86,
(3) (3)
in in
the the
conventional conventional
generation. generation.
the the
objects objects
0. 0.
driven driven
'accept', 'accept',
the the
of of
a a
a a
which which
rule rule
tail(L), tail(L),
We We
recognized recognized
must must
single single
symbol. symbol.
current current
GOTO GOTO
we we
the the
···················+--• ···················+--•
When When
the the
stack stack
is is
LR LR
completely, completely,
use use
parser parser
do do
discussed discussed
language language
denoted denoted
we we
section section
go go action action
the the
uses uses
action. action.
or or
and and
parsing parsing
is is
not not
an an
an an
input input
to to
component. component.
use use
parser parser
The The
LR LR
the the
'error'. 'error'.
only only
state state
object object
non-terminal. non-terminal.
generate generate
length(L), length(L),
object object
tree tree
'(shift '(shift
the the
4.7], 4.7],
by by
current current
parsers parsers
symbol, symbol,
any any
parser parser
LR-P LR-P
in in
starts, starts,
one one
the the
state', state',
algorithm algorithm
an an
functions functions
the the
0 0
more. more.
of of
After After
a a
state')' state')'
empty empty
action action
ARSE ARSE
stack:, stack:,
parse parse
bit bit
has has
state state
calls calls
type type
next next
run
The The
and and
and and
the the
the the
to to
5 5
a a a a 6
push(start-state. parser.stack) symbol, sentence := head(sentence ), tail(sentence) while true do state := top(parser.stack) if3action E ACTION(state, symbol) then if action = (shift state') then push(state', parser.stack) symbol, sentence:= head(sentence), tail(sentence) elseif action = (reduce A ::= j3) then for 1 · · · length(j3) do pop(parser.stack) od state':= top(parser.stack) push(GOTO(state', A), parser.stack) elseif action= (accept) then return true fi else return false
od
3.2. (Pseudo-)parallel LR parsing The parallel LR parsing algorithm starts as a simple (conventional) LR parser, but splits up in multiple parsers when ACTION(state, symbol) returns multiple actions. All simple LR parsers are synchronized on their shift actions in such a way, that only when all parsers have shifted on the current input symbol, the next symbol is processed PAR-PARSE: Parse a sentence with (pseudo-)parallel running LR parsers. Input: A start state start-state and a sentence sentence. Start-state is the state in which the first simple LR parser starts, and sentence is a list of tenninals terminated by the end-marker $. Output: 'true' if sentence is grammatically correct, 'false' otherwise. Description: The synchronization on shift actions is expressed in the algorithm by using two pools of parsers, this-sweep and next-sweep. The parsers in this-sweep still have to shift on the current input symbol, the parsers in next-sweep are waiting for the next symbol. Only when this-sweep is empty (and if there are parsers left in next-sweep), the next symbol is read from the input sentence. When both pools of parsers are empty, this means that all parsers died in an accepting or rejecting configuration. PAR-PARSE accepts its input if at least one simple parser accepts it. For each input symbol parsers are taken from this-sweep, until this-sweep is empty. The parser that is taken from this-sweep is removed from it, because for each action a copy of the parser is made and the action is perfonned on this copy. So, when there are no actions to be perfonned, the parser just disap pears. Shift actions place the copy in the next-sweep pool, reduce actions put the copy back in this sweep. Accept actions set the variable accepted to indicate that a simple parser has accepted the input. It is important for the lazy parser generator that the implementation of the copy operation for parsers is such that the parse stacks become different objects which share the states on them. Algorithm: PAR-PARSE(start-state, sentence): accepted := false start-parser := new(LRparser) push (start-state, start-parser .stack) next-sweep:= {start-parser} while next-sweep :;:.0 do " symbol, sentence:= head(sentence), tail(sentence) this-sweep, next-sweep:= next-sweep, 0 while 3parser E this-sweep do this-sweep := this-sweep - {parser} tion The extension. PAR-PARSE erators.
larger
parsing 4.
built which
represents thus labeled mar,
4.2
A
table
move along the
•
•
set
CONVENTIONAL
shows
stacked
kernel
transitions
describes
parse
the
its by
generator.
of
The
Each
We
parser
the
algorithm,
parse
edges
the
•tabeled
notion
items
The
parser
rule. Each
transition. symbol
often
lazy
the
table
a
table
As
path
generator.
box
state
an
control
kernel
is
table
moves how
is
transition
of
far
parse
edge underlying
in
controlled
generated
od
return
speak
generation
is
with
an
of
in
But,
of
the
as
that
a
a
this
and
states,
the
object
The
od
the
parse
field
structure
terminal
labeled
the
of
table
graph.
a
as
of
state/set
This
control
for
actions
its
state
symbol.
od
graph
accepted
parser,
a parser
PARSER
transition
in
parsing
one
a
of
table
and
with by
graph
generation
both
parser':=
by
elseif
if
elseif
'V
fi
parser
the
algorithm
internal
a
with
action
:=
can
action
During
an
the
from
of
start-state
next
set
push(state',
structure
push(GOTO(state',
accepted state':=
for next-sweep
this-sweep:=
and
the
transitions
:=
can
of
top(parser.stack)
the
when
LR
Transitions
of
algorithm
action
action
Fig.
transition
argue
a
ACTION(state.
of
($
GENERATION
items.
generator,
1
following
each
which
item
a
parse
tenninal
be
=
parse
structure e
·
accept)
move parsing,
items
copy(parser)
·
algorithm
4.l(c)
we
parsing
(shift
actions
seen
·
top(parser'.stack)
that the
can
=
=(accept)
length(p)
sets.
state
:=
table
and
ACTION
describe
The
table
(reduce
:=
field
parser'.stack)
itself
true
is
forward
contains
as
state')
be
have
symbol;
is
this-sweep
is
fields:
next-sweep
while
a is
is
the the
the
a program
do
dots
and
a
generated
shift
a
generator
generated
equivalent
of
a
discussed
special
is
set
sentence
parser
symbol)
'directed the results
A
do
A),
then
then
a
the
in
concerned,
and
along
'
in
the
action;
::=
•
set
a
of
pop(parser'.stack)
form
this
'
parser'.stack)
the
parsing
reduce
indicate
GOTO
v
p)
case,
items.
rules
of
v
moves
running
of
by
is
an
parse
section
in
LR
{parser'}
then
to
(symbol
'true
graph
items
{parser'}
if a
ACTION
a
edge
section
a
tabular
the
that
it
parser
action case,
states.
there
receive
set
The
tables
how
through
is
or
accept
on
of
contains
labeled is
of
a
are
false'.
itemset),
we item
non-terminal
is
an
arrows
5
the
generator.
first
items.
representation
far
Fig.
and
potentially
are
will
their
no
LR-parsing
actually
action.
od
conventional
the
the
sets'.
causes
with
difference
interpreted
4.1
GOTO.
an
tum
information
The
between
parser
graph:
with
edge
gives
a
Each
out
ought
a
non-tenninal graph
the
being
itemset
move
machine.
to Start-state
has
to
a
of
an
between
row
sets
transition
by
shift
LR(O)
another
be
an
to
of example
progressed
too.
recognized
back
a
a
internal
speak
in
these
a
of
hard-wired
straightforward
action
set
the
The
items
algorithm,
the
symbol.
according
set
is
of
item
is
parse
of
of
part
next
two
items.
structure
causes
a
of
a
a
in
are
GOTO
by
sets
gram
parse
items
table
each
gen
of
sec
LR
Fig
the
the
of
to
is
If
7
a
a 8
no. rule
0 B ::=true 1 B ::=false B~ true false 2 B ::= B orB 3 B ::= B andB 4 START::=B \n/=*·j~,?l.]
...... ~-=.-~----...~ true true false false ACTION GOTO true false or and $ B state 0 s2 s3 1 accept 1 s5 s4 ace 2 rO rO rO rO rO 3 rl rl rl rl rl 4 s2 s3 6 5 s2 s3 7 6 r3 r3 s5/r3 s4/r3 r3 7 r2 r2 s5/r2 s4/r2 r2
Fig. 4.1: (a) Grammar of the Booleans, (b) parse table, (c) graph of item sets.
B ::=B • andB B ::= B • or B B ::= B • or B Fig. 4.2: The parsing of 'true or false'. • reductions The reductions field of a set of items contains the syntax rules that have been recognized com pletely in this state/set of items. These rules may be reduced by the parser. In diagrams we indi cate reductions by underlining a rule in the corresponding kernel field (reductions of rules which are not also in the kernel field can not be represented in these diagrams). • type The value of the type field of a set of items may be 'initial' or 'complete'. If it is 'initial', the fields transitions and reductions have not yet been computed. In diagrams we indicate 'complete' sets of items with a black circle and 'initial' sets of items with an open circle. The number in the circle serves as a unique reference to each set of items. The parse table in Fig 4.l(b) is a tabular representation of the graph of item sets of Fig 4.l(c). This representation is normally used for an LR parse table. It contains the results of the functions ACTION(state, symbol) and GOTO(state, symbol) with a row for each state and a column for each sym bol. We shall not use these parse tables further, because the lazy parser generator also needs the kernel field Qf each set of items during parsing. The correspondence between the behaviour of ACTION and GOTO for a given state/set of items and the transitions and reductions fields is described by the following algorithms.
Description: Description:
Input: Input:
EXP EXP
Algorithm: Algorithm:
Output: Output:
Description: Description:
GENERATE-PARSER: GENERATE-PARSER:
Input: Input:
the the
Algorithm: Algorithm:
ACTION: ACTION:
GOTO: GOTO: Output: Output:
Description: Description:
Input: Input:
Algorithm: Algorithm:
Description: Description:
Output: Output:
Input: Input:
SURE SURE
On On
itemset. itemset.
transforms transforms ltemsets ltemsets
mar. mar.
symbol symbol is is
grammar grammar
add add
itemset itemset
are are
mar. mar.
zero zero
that that
tions tions
tions tions
AND: AND:
composed composed
The The
A A
A A
A A
A A
shifting shifting
expanded, expanded,
initial initial
The The
there there
The The
The The
set set
or or
state state
START START
grammar grammar
state state
field. field.
ACTION ACTION
and and
to to
Transform Transform
graph graph
it it
of of
more more
of of
This This
GENERATE-PARSER( GENERATE-PARSER(
contains contains
state state
This This
GOTO(state, GOTO(state,
ACTION(state, ACTION(state,
new new
actions actions
The The
This This
The The
generate generate
by by
returns returns
reductions reductions
is is
sets sets
state state
the the
state state
items items
an an
S S
Because Because
exactly exactly
may may
return return
start-itemset.type start-itemset.type ltemsets ltemsets
start-itemset start-itemset
od od
start-itemset.kernel start-itemset.kernel
while while
the the
return return
return return
result result
of of
(or (or
terminals terminals
of of
extended extended
state state
argument argument
argument argument routine routine
in in
right-hand right-hand
initial initial
routine routine
of of
as as
Grammar, Grammar,
and and
and and
all all and and
EXP EXP
routine routine
item item
which which
the the
itemset itemset
reducing reducing
not not
is is
items items
well well
all all
an an
3itemset 3itemset
an an
for for
rules rules
:= :=
GOTO GOTO
Build Build
the the
start-itemset start-itemset
a a
one one
state'[(symbol state'[(symbol
a a
result result
parser parser
AND(itemset) AND(itemset)
we we
initial initial
set set
fields. fields.
sets sets
be be
:= :=
computes computes
sets sets
non-terminal non-terminal extension extension
tenninal tenninal
{(accept) {(accept)
{(reduce {(reduce
{(shift {(shift
the the
generates generates
symbol): symbol):
as as
parsing parsing
kernel kernel
to to
state state
and/or and/or
GENERATE-PARSER: GENERATE-PARSER:
state state
used used
state state
transition transition
{start-itemset} {start-itemset}
of of
assume assume
with with
in in
symbol): symbol): side. side.
of of
:= :=
a a
for for
ltemsets. ltemsets.
parser parser
from from
which which
to to
can can
can can
items items
set set
graph graph
Grammar Grammar
new(itemset) new(itemset)
e e
items items
in in
S), S),
in in
is is
is is
state') state')
searching searching
type type
is is
:= :=
ltemsets[itemset.type ltemsets[itemset.type
of of
non-terminals. non-terminals.
symbol symbol
access access
must must
perform perform
which which
Grammar): Grammar):
the the a a
A A
a a
the the
which which
I I
of of
the the
the the
:={START::= :={START::=
partitioned partitioned
after after
initial initial
a a
is is
items items
into into
complete complete
complete complete
(symbol (symbol
of of
::= ::=
for for
created created
'initial'. 'initial'.
state') state')
graph graph
the the
symbol. symbol.
right-hand right-hand
transitions transitions
a a
graph graph
parser parser
item item
start. start.
I I
set set
jJ) jJ)
symbol symbol
other other a a
reducing reducing
with with
(symbol (symbol
parsing parsing
symbol. symbol.
ACTION ACTION
kernel kernel
into into
complete complete
in in
for for
I I
of of
sets sets
of of
e e
A A
accept) accept)
during during
state. state.
of of
will will
START START
set set
set set
sets sets
complete complete a
sets sets
syntax syntax
in in
state.transitions] state.transitions]
item item
::= ::=
item item
in in
for for
containing containing
1 1
side side
starts starts The The
of of
of of
subsets subsets
and and
state') state')
.p .p
a a
have have
state.transitions. state.transitions.
p p
of of
of of
and and
rule rule
a a
items, items,
items, items,
the the
one. one.
sets sets
as as
sets sets
I I
e e
e e
of of
items items
grammar. grammar.
non-terminal non-terminal
reductions reductions
rules rules
items items
START::= START::=
=initial] =initial]
and and
state.reductions} state.reductions}
state.transitions} state.transitions}
GOTO GOTO
advanced advanced
left-hand left-hand
generation generation
that that a a
e e
for for
of of
to to
set set
While While
syntax syntax
and and
and and
state.transitions} state.transitions}
is is
A A
in in
have have
all all
rules rules
that that
delivered delivered
the the
of of
the the
::=a, ::=a,
this this
the the
the the
obtain obtain
rules rules
items. items.
do do
are are
fields fields
p p
expanding expanding
given given
rule. rule.
been been
side, side,
root root
one one
having having
graph. graph.
return return
return return
START START
e e
process. process.
with with
not not
that that
Grammar} Grammar}
step step
their their
of of
symbol symbol
generated generated
with with
of of
grammar. grammar.
yet yet
the the
u u
the the
value value
A A
value value
may may
itemset. itemset.
The The
is is
recognizing recognizing
information information
a a
the the
a a
complete. complete.
the the
u u
graph graph
It It
same same
non-terminal non-terminal
set set
in in
become become
kernel kernel
is is
is is
is is
dot dot
symbol symbol start
state state
correctly, correctly,
of of
used used
deduced deduced
deduced deduced
It It
The The
symbol symbol
of of
placed placed
items, items,
starts starts
field field
state. state.
item item
a a
when when
applicable applicable
set set
Routine Routine
is is
rule rule
generated generated
of of
by by
S S
from from
before before
sets sets
from from
we we
of of
EXPAND EXPAND
and and
after after
sets sets
in in
of of
start-itemset start-itemset
using using
items items
can can
for for
a a
the the
the the
EXPAND EXPAND
its its
its its
of of
the the the the
a a
in in
be be
Gram
subset subset
transi
transi
CW
gram
list list
items items
start
state state
from from
may may
dot. dot.
sure sure
first first
of of 9 9 10
associated with S. For each S the associated subset is transformed into a new kernel kernel' by moving the dot over the S. When a set of items itemset' with kernel kernel' does not yet exist, it is generated as an initial set of items. The transition (S itemset') is now added to itemset.transitions. A rule in the extented kernel having its dot at the end has been recognized completely. It depends on the left-hand side of the rule if this means an accept or a reduce action. Algorithm: EXPAND(itemset): cl-items := CWSURE(itemset.kernel) itemset.transitions, itemset.reductions := 0, 0 for "i/S e {S I A ::= a.sp e cl-items} do
kernel':= {A ::=aS .~IA ::=a.S~e cl-items} if3itemset' e ltemsets[itemset'.kernel = kerne/1 then itemset.transitions := itemset.transitions u { (S itemset')} else itemset' := new(itemset) itemset'.type, itemset'.kernel :=initial, kernel' ltemsets := ltemsets u { itemset'} itemset.transitions := itemset.transitions u { (S itemset')}
od for 'VA ::= p. e cl-items do if A =START then itemset.transitions := itemset.transitions u { ($ accept)} else itemset.reductions := itemset.reductions u {A ::= p}
od itemset.type := complete
CWSURE: Input: A set of dotted rules kernel. Output: The same set of dotted rules, extended with all rules that can also become applicable. Description: CLOSURE extends a given kernel with all rules that may become applicable. If there is a rule A ::= a.Bp in the kernel it means that non-terminal B may become applicable. Hence, the kernel can be extended with all rules B ::= .y, because when a B can be recognized, all rules that derive B can also be recognized. Algorithm: CWSURE(kernel): closure := kernel while 3 A, B, a, p,y [A ::=a.BP e closure A B ::= r E Grammar A
B ::= •Y ~closure]do closure:= closure u {B ::= •"(} od return closure
5. LAZY PARSER GENERATION The parser generation algorithm described in the previous section generates the parser completely before it is use S.1. An algorithm for lazy parser generation We only have to adjust the LR(O) parser generator of the previous section a little to transform it to a lazy parser generator. We move the parser generation phase into the parsing phase by moving the expansion of initial sets of items from GENERATE-PARSER to ACTION. This means that the state with which ACTION is called, cannot only be a complete set of items but also an initial one. When it is still initial, it is expanded first by EXPAND. GENERATE-PARSER now only generates start-itemset as an initial set of items, the rest of the parser generation will be taken care of by ACTION. GENERATE-PARSER: Build the first part of a graph of item sets from a grammar. Input: A grammar Grammar, which is a set of syntax rules A ::=ex. Output: The state in which parsing must start. Description: This routine constructs the set of items start-itemset: the root of the graph of item sets for the given grammar. ACTION and GOTO can access other sets of items in this graph. The kernel field of start-itemset is composed of all rules in Grammar with STARTas left-hand side, with the dot placed before the first symbol of the right-hand side. Algorithm: GENERATE-PARSER(Grammar): start-itemset := new(itemset) start-itemset.type := initial start-itemset.kernel := (START::= .13I START::= 13e Grammar} Itemsets := (start-itemset} return start-itemset ACTION: Input: A state state and a terminal symbol. Output: The actions the parser can perform in state. Description: When state is an initial set of items it must be expanded first. The complete set of items is then used to deduce the return value from the transitions and reductions fields. Algorithm: ACTION(state, symbol): if state .type = initial then EXPAND(state) result := {(reduce A ::= 13)I A ::= 13e state.reductions} u {(shift state') I (symbol state') E state.transitions} u {(accept) I (symbol accept) e state.transitions} return result Like ACTION, GOTO uses information that is only available in complete sets of items, so one might be inclined to think that the same test for initial sets of items has to be added to GOTO as well. However, due to the the particular way in which the parsing algorithm works, GOTO will only be called with sets of items that have already been completed. This is proved in Appendix A. The implementation of the lazy parser generator has to treat variables Itemsets and Grammar of GENERATE-PARSER as global variables, because they are needed during the expansion of sets of items. Furthermore, several complete sets of items can point to the same initial set of items. When expanding an initial set of items, the implementation has to take care that all sets of items that originally pointed to the initial set of items now point to the completed one. the the The The again again lazy lazy some some needed needed parse parse distributed distributed has has generation generation grammar grammar 5.3. 5.3. again. again. graph graph start-state. start-state. oarsed. oarsed. graph graph 5.2. 5.2. When When generator generator Consider Consider 12 12 kernels kernels to to parser parser overhead overhead The The An An extra extra when when We We In In the the All All of of shown shown be be the the The The during during example example item item contrast contrast done done cost cost like like the the SDF SDF sentences sentences initially initially considered considered parser parser technique technique over over time. time. generator generator when when advantage advantage the the ~ ~ Hence, Hence, Fig. Fig. Fig. Fig. in in grammar grammar sets. sets. that that of of in in as as parsing, parsing, definition definition grammar grammar Fig. Fig. parsing, parsing, lazy lazy time time all all 5.2: 5.2: before. before. is is 5.1: 5.1: to to of of consists consists of of Only Only that that ACTION ACTION 5.l(b). 5.l(b). given given sets sets lazy lazy clearly clearly the the uses uses SDF SDF making making The The parser parser introduced introduced of of The The of of the the only only of of the the is is will will accept accept conventional conventional of of for for parser parser Only Only the the more more graph graph (given (given its its only only graph graph modified. modified. items items lazy lazy Fig. Fig. SDF SDF has has lazy lazy sentences sentences generation generation contain contain not not the the booleans booleans first first is is the the memory memory of of of of advantages. advantages. parser parser called called 5.2 5.2 ....,,._....._-=r--="-"'i.-' ....,,._....._-=r--="-"'i.-' of of generation generation technique technique lazy lazy by by have have in in increase, increase, itself itself B B B B the the B B sentence, sentence, item item test test item item shows shows appendix appendix ::=Band ::=Band ::= ::= ::=B ::=B this this 'and' 'and' start start parser parser been been containing containing technique, technique, in in of of with with B B generator generator (see (see sets sets than than sets sets lazy lazy •and •and • • ACTION ACTION Fig. Fig. set set orB orB the the since since and and is is expanded, expanded, section section after after its its initial initial after after generator generator B B a a rather rather technique technique B) B) of of 4.l(a). 4.l(a). graph graph B B comparable comparable • • 'true', 'true', first first B B B B items items STAR STAR even even only only the the also also B B (a) (a) 'false' 'false' true true where where I I ~ ~ ::=B ::=B ::=B ::=B which which set set B,'=~· B,'=~· small small 7 7 step step of of sentence sentence generation, generation, will will 60 60 for for in in The The needs needs (with (with but but even even of of 1 1 item item • • • • is is ::= ::= true true the the or or will will only only orB orB andB andB percent percent determines determines items items for for all all small. small. the the now now graph graph conventional conventional B B 'or', 'or', sets sets type type more more the the worst worst measurements). measurements). the the • • false false be be incremental incremental 'true 'true the the be be false false start-state start-state B B after after kernel kernel the the I I (b) (b) grammar grammar initial) initial) ~ ~ to to of of of of The The 1} 1} lazy lazy """Cld. """Cld. parsed parsed ACTION ACTION case case and and item item ask ask the the graph graph the the the the the the total total than than true true true' true' fields fields ,,ill,.1 ,,ill,.1 exactly exactly parse parse first first what what shown shown one. one. type type sets sets sentence sentence without without parser parser of of of of which which generation generation it it call call and and has has the the item item of of generated generated of of One One already already table table actions actions In In false false in in the the a a I I been been the the to to booleans, booleans, generator generator GOTO GOTO this this [;,,ill,.1 [;,,ill,.1 Fig. Fig. further further is is given given sets sets could could 'true 'true ACTION. ACTION. had had same same sets sets expanded expanded case case it it parsed. parsed. is: is: 5.l(a). 5.l(a). time, time, has has and and to to has has by by set set decide decide of of information information it it expansion expansion amount amount be be to to the the but but will will is is the the items. items. of of true' true' which which to to be be generated generated unnecessary unnecessary items items lazy lazy for for perform perform lazy lazy first first need need to to expanded expanded has has of of a a remove remove is is So So parser parser parser parser larger larger of of to to takes takes work work them them been been now now was was the the the the the the to to in in The actions overhead to depends the is the has 6. item both chances of is of for grammar reasons Transitions hne. Unfortunately, changed. severe already addition tences grammar, accept shown expand modified, its the INCREMENTAL kernel?) the conventional already lazy grammars sets graph In We new Consider, old implications for 'a to are been this on in parser of that incurred with an first b' there believe one. grammar. the Fig generated of that the the have it turned section entire and can generated. is show the item specific will 6.1. 'size' rule is generator large Is for likely method, been still 'c rule (For no o Consider, so. the set sets. out have than that instance, we A b', guarantee parts of and When PARSER be of 'B old ::= that what added to symbol describe Fig. but extension the large Suppose in items this it used This be ::=unknown'. B b can by re true true In,,.$. graph IFA~--~1 of is a the to 6.1: it symbols modification, so the restarting relatively is the it still to parts only is in the grammar. the grammar at large previous that not have with graph GENERATION an some (a) true of grammar for the the once, a rather of always incremental react rule item the in The instance, which that to a graph has smallest false common. sets of grammar from large original graph since be is false is original wasteful. sets The case. the to no item 1~3L but added removed. of the of a for ACTION modifications net scratch. false set part in complicated only graph This Fig. also items, grammar the case. sets We the some parser gain of does graph, to of 6.2(a) grammar that example on items We modified is show a of it in but grammar. already Although was not sense is how item part generator stays would efficiency (b) a we for existing already in correspond I subgraph of way called. has much sets the shows a Fig of which could the grammar. the highly subgraph like the to update to the for 6.3 Oearly, same, that been grammar be of transitions was describe However, to think booleans that the lazy the of the the expanded specialized retains exploit in exp.~ed? the to of graph and booleans even of How modifications a graph technique of the be the ef straightforward ~ graph the by that unknown unknown a for expected. those of old graph. the or if of the much language new throwing had that Fig. rules which the kernels item for grammar fact towards has additional parts What is one already is has graphs the 4.l(a), now sets that when an are needed of an as with modified to of did improvement away was only the the of way to update is well? be the of and been administrative not Fig. be a graph the only old to thrown item added subgrammar the old to updated, deduce extend a have closure 6.2(b) generated There extension grammar. grammar, grammar has parser ~ graph sets after to away to more over sen this the are has for 13 the of of be as an it 14 The 6.1. initial, used will ing items which there added re-expanded, A sitions should to ADD-RULE: transition in expand re-expanded. Des'iription: DELETE-RULE Input: Algorithm: ::= the the the then, Add incremental Grammar An in was would Suppose .~ Similarly These transitions to recognition that The field. exisisting have closure A::=b the E::=cC C::=B START::=E B::=b D::= START::=D A::=B algorithm them syntax the in when (A a were rule graph rule the A contain ADD-RULE( closure sets MODIFY itemset') aA Add but This when ::= of a rule needed, rule when closure with (from MODIFY(rule, parser use rule of complete field every graph of a of .~in for can items, rule A that when the item needed. its the A we in the of : is incremental := generator re-expand kernel? dot before of the ::=~is be same in has to their new called delete the rule): sets viewpoint Fig. • which that their the ones ~· which achieved closure Fig. to set rule transitions routine for Because How grammar 6.3: be closure with added v) a kernel. present Initial 6.2: of are retains added. rule the these should recognition an The of items can parser of operator (a)A the simply A MODIFY new their A to sets contains the modification in addition sets we a and These first field. ::= only the start. in problem. grammar, the grammar. new of kernel find generation of ~ question. update by grammar. ones 'v' closure, items that from of items are, to grammar) at In these making and to this update least by part affected the the the can similar Fortunately, perform (b) b of according fn~- the It deletion sets EXP new rule So of corresponding the grammar, one its easily does We them presence the the expanded we of AND graph graph by graph started. to dotted then the graph items this can (possibly the be addition, initial of to J must actual we of of dealt have the by the recognize we a modification of rule in of item Fig. incorrectly. These rule can making and a new the closure have already item graph with update. transition to with incomplete) the 6.2(b sets. be existing are let find grammar. sets. to are sure because complete of all its the all of ). find have so the item of dot complete the the sets that The similar, (A lazy graph the the states added before sets sets. kernel graph they itemse() of A lazy grammar, states sets parser ::= items without of (sets do sets a ADD-RULE parser an that of of .~will transition items (sets not A. generator in their of these items in of can have recomput have items the But generator items) of that only still sets with items) graph when for to tran to that had and be re be be of be in A a DELETE-RULE: Input: Description: MODIFY: Algorithm: Input: A Description: Algorithm: tial, When, transformed item established, When A again. (A modified When according re-expand because sets ::= 1be items the for rule -~in for Modify A rule example, lazy et) DELETE-RULE(rule MODIFY MODIFY(A and into MODIFY A grammar. the is them to they Fig. rule its ::=~and in MODIFY(rule. Grammar if else the fi parser the grammar the Delete the kernel. A their a 6.4: accept when had that for start-itemset.kernel start-itemset.type od grammar start-symbol initial = unconnected modification the is START uses transitions Graph generator Vitemset itemset.type has a This a called rule ::= a needed rule When of modification transition :=Grammar set to global ~. B B B Fig. is then 'B and of from of be an ::= ::= ::= [itemset.type with D ) - (A done it items ::=unknown' item by ): deleted. E B B START B update graph 6.l(a) ): now and field. is itemsef) 6 variables the • • andB Itemsets operator'-' := the not, or andB for := by sets the 8 initial expands grammar B B operator := parser. of initial is D These is the making of we 'B' • start-itemset.kernel for graph Fig. updated generated {A = the E corresponding search Grammar, ~ the in complete itemset.transitions] is ::= sets 6.4. 0 ::= to grammar, and their of D all boo added ~} again, perform by B of Itemsets item (which incorrectly update with leans • items MODIFY, or transitions A to Itemsets, sets B its kernel after the we graph may the are the for former is D grammar know actual made corresponding addition expanded all reduced be 'B the of {A and do field. complete 'u' ::=unknown item connections sets ::= initial update. that start-itemset. or'-'). of of .~} to of sets. The sets only 'B the items a to sets graph : let graph graph := of booleans, start-itemset • of with unknown'. the 0, items '. items that 4, of The Grammar lazy of and I, item in item is resulting and 2, parser with 5 Itemsets correct sets. and are the can sets a is generator transition made 3 graph updated graph contain for is are initial thus ini the 15 re of is stand stand added added set set erating erating of of it it sible sible used used words, words, new new refers. refers. Furthermore, Furthermore, have have stay stay to to parser. parser. the the There There 6.2. 6.2. expanded, expanded, A A by by When When too too it it retaining retaining to to to to affected affected shown shown 16 16 has has takes takes compromise compromise an an the the it. it. items items definition, definition, of of much much modified modified forever forever rule rule Garbage Garbage again. again. to to part part initial initial and and been been Routine Routine The The items items The The MODIFY MODIFY grammar grammar to to MODIFY MODIFY is is in in the the and and }Vhen }Vhen be be This This time, time, by by yet yet the the that that is is unused unused Fig. Fig. more more of of is is dilemma dilemma example example the the regenerated regenerated expanded expanded removal removal discarded discarded correcting correcting this this set set with with in in thrown thrown another another the the grammar grammar would would MODIFY MODIFY while while initial initial grammar, grammar, will will 6.5. 6.5. transitions transitions the the EXPAND EXPAND collection collection Itemsets. Itemsets. solution solution of of robust. robust. of of modification. modification. is is graph graph makes makes sets sets a a items items reference reference the the not almost almost Fig. Fig. history history accept accept of of there there be be of of Fig. Fig. sets sets is is away, away, problem problem the the of of again). again). of of a a Fig. Fig. thus: thus: of of unused unused in in (this (this 6.l(a), 6.l(a), should should a a is is items items with with graph graph new new of of 6.5: 6.5: sets sets ref ref the the to to set set is is item item contradiction contradiction best best certainly certainly to to (its (its but but 6.2 6.2 items items count count 8 8 no no B B B B B B would would when when attach attach count count of of booleans, booleans, kernel kernel The The and and references references and and to to Frequent Frequent an an in in ::=B ::=B ::= ::= set set ::=B ::=B of of old old possible possible make make sets sets sets. sets. when when and and guarantee guarantee items items ltemsets ltemsets be be item item B B do do graph graph increments increments field field of of 9 9 transitions transitions all all 6 6 of of to to occur occur 6.3 6.3 of of • • andB andB • • never never solved solved will will { { In In items items not not sets sets or or andB andB everything everything B B initial, initial, each each unreachable unreachable a a sets sets items items will will sets sets processed processed is B B B B particular, particular, algorithm algorithm ::=b ::=b modification modification with with of of set set have have be be to to of of the the in is is • • that that be be are are Fig. Fig. 3 3 set set in in have have reinstated, reinstated, of of of of essential, essential, items items these these is is the the is is •. •. B B the the lb@. lb@. ~ ~ the the used used field). field). the the a a the the items items of of clearly clearly items items postponed postponed made made 6.4 6.4 A A example example transitions transitions transitions transitions ::= ::= is is lazy lazy been been items items ref ref incremental incremental in in re-expanded re-expanded ::=b. ::=b. 'dirty' 'dirty' sets sets partial partial sets sets retained, retained, again. again. B B after after true true correctly correctly ount ount the the l, l, of of It It becomes becomes initial, initial, philosophy. philosophy. • • otherwise otherwise decreased decreased separated. separated. but but 6, 6, may may or or a a is is of of a a sense sense } } ref ref re-expansion re-expansion of of rather rather fields fields . . and and B B expanded expanded grammar grammar re-expansion re-expansion until until the the items items For For field). field). Set Set false false of of count count Fig. Fig. because because (but (but we we I I by by 7 7 that that transition transition parser parser that that ~,,~,,~~~j ~,,~,,~~~j zero, zero, of of example, example, sets sets the the of of than than will will up up end major major 6.4). 6.4). of of are are MODIFY. MODIFY. need need This This field, field, items items unknown unknown When When set set the the Also, Also, it it chance chance those those can can of of in in never never initial. initial. it it removed removed it it does does generator, generator, of of sets sets On On of of items items makes makes parts parts the the is is has has not) not) telling telling cause cause of of 7 7 with with to to items items in in the the set set sets sets when when removed removed and and the the not not same same be be sets sets 7 7 of of a a is is MODIFY MODIFY A A be be of of When When of of will will on on transition transition set set too too re-used re-used better better items items the the of of other other the the many many always always immediately, immediately, how how dirty dirty to to the the of of items items the the created. created. b b namely namely of of way way items items much much others others ever ever algorithm algorithm transition transition is is items items graph graph items items the the from from many many hand, hand, rule rule it it set set replaced replaced that that useless useless as as 0. 0. (unless, (unless, the the retain retain creates creates be be to to rule rule for for grubage grubage of of an an disappear disappear is is grubage grubage 'B 'B they they ltemsets. ltemsets. of of which which is is On On functions functions sets sets there there used used items items A. A. avoided, avoided, initial initial A A re-expanded re-expanded of of easier easier item item ::= ::= sets sets the the it it by by the the transitions transitions will will of of When When ::=bis ::=bis of of 2 2 is is again again are are B B it it largest largest in in to to is is a a of of course, course, items items sets sets collection collection likely likely set set one one no no xor xor transition transition (because, (because, to to never never ltemsets. ltemsets. an an 7 7 also also In In items items for for because because 3 3 are are longer longer under by by would would added added initial initial hand, hand, After After other other B' B' is is refer refer gen pos that that sets sets not not the the the the for for re to. to. be be to to is is 8. 8. Although Although tion. tion. low low the the In In syntax-directed syntax-directed all all tems tems The The parsers parsers in in bage bage The The not not which which each each addition addition definitions definitions Yacc Yacc to to We We 18 18 • • •PG: •PG: • • • • • • • • • • our our the the CONCLUSIONS CONCLUSIONS be be algorithms algorithms grammar grammar IPG: IPG: parse parse parse parse modify modify construct construct print print Yacc: Yacc: measured measured by by results results measurements measurements and and collections collections case case IPG IPG generates generates To To able able LeLisp LeLisp adds adds opinion, opinion, building building was was parsing parsing erating erating erates erates The The erator erator the the In In generation generation has has than than tables, tables, PG. PG. environment environment 'Exam.self' 'Exam.self' generation generation Yacc Yacc or or (orinSDF: (orinSDF: other other it. it. prevent prevent the the an an incremental incremental by by uses uses of of to to this this deletion deletion of of the the an an second second a a input input great great All All fact fact same same adding adding of of a a that that The The which which use use the the stream stream the the generates generates environment environment for for parsers parsers highly highly generates generates element element twice twice parse parse grammar grammar case case whereas whereas editor), editor), the the affected affected C-code, C-code, SDF SDF the the were were measurements measurements the the 0. 0. 7.6 7.6 1.3 1.3 the the that that measurements measurements times times of of incremental incremental sentence sentence the the second second advantages, advantages, sentence sentence 7 7 have have time time measurements measurements parse parse of of incremental incremental time time after after the the the the the the "(" "(" Yacc. Yacc. input. input. sec sec sec sec sec sec table table lexical lexical as as and and AND AND of of dynamic dynamic only only generation generation This This PG PG same same in in ::="(" a a in in we we used used much much parsers parsers smallest smallest grammar grammar rule rule lexical lexical parts parts time time which which and and taken taken Yacc Yacc been been for for for for for for C, C, the the the the for for input input reason reason CF-ELEM+ CF-ELEM+ always always generates generates priority priority for for allowed allowed generation generation FUTURE FUTURE (SDF (SDF consider consider and and while while twice. twice. There There input input scanner scanner The The the the the the Y Y roughly roughly reconstruct reconstruct by by needed needed modification modification needed needed a a SDF; SDF; of of parse parse ace ace repeated repeated generates generates parser parser by by applications. applications. as as was was have have tokens tokens sentences sentences Yacc Yacc that that C C C C were were the the are are generator generator the the has has of of that that definition) definition) modification modification rule rule takes takes is is PG PG PG, PG, to to sentences sentences compiler compiler compiler compiler and and convincingly convincingly are are between between three three LR LR compiled compiled time time given given parsers parsers shown shown parse parse are are generate generate and and IPG IPG 15 15 parts parts takes takes been been for for generation generation PG PG and and compiled compiled ") ") parser parser WORK WORK time time indicating indicating to to function function already already on on parse parse less less about about lines lines the the LALR(l) LALR(l) ?" ?" the the constructing constructing parser parser as as to to reasons reasons uses uses IPG IPG to to table. table. in in input input of of carried carried on on the the of of consists consists by by to to to to Yacc, Yacc, in in parse parse twice; twice; be be measurements. measurements. time time file file again again -> -> Fig. Fig. be be the the is is and and tables tables the the the the in in top top twice twice link link compile compile same same use use less less the the in in the the time time an an declarations. declarations. by by generators generators quite quite unacceptably unacceptably system system texts texts show show CF-ELEM CF-ELEM 68020 68020 in in SDF SDF that that parser parser 7 7 than than the the parse parse memory, memory, for for of of table; table; out out but but excellent excellent LR(O) LR(O) same same .1. .1. the the after after relatively relatively tables. tables. the the generation generation of of this this as as may may time. time. used used the the largest largest this this only only of of small small the the since since They They compiled compiled on on grammar grammar the the the the fast fast the the LeLisp LeLisp machine machine table table from from in in paper. paper. different different the the lazy lazy (Lisp) (Lisp) seem seem tables tables a a parse parse difference: difference: parser; parser; by by benefits benefits first first C; C; The The a a as as and and and and ), ), SUN SUN (the (the show show we we choice choice modification. modification. part part 142 142 small small influencing influencing while while one, one, the the We We IPG IPG high high time time a a compiler compiler the the one. one. environment environment parse parse expect expect the the and and We We table table parser parser shows shows difficult difficult parser parser code code lines. lines. of of 3/60 3/60 parsers parsers length length the the added added which which generated generated of of parsers parsers is is construction construction for for parsing parsing than than generate generate the the kept kept for for This This is is times times following: following: Yacc Yacc lazy lazy negligible. negligible. by by under under grammars grammars to to that that interactive interactive The The will will parse parse almost almost an an 'Complice' 'Complice' and and rather rather Yacc, Yacc, the the generated generated problem, problem, the the in in the the the the difference difference Other Other and and constructed constructed interactive interactive some some uses uses of of the the syntax syntax tum tum mainly mainly measurements, measurements, parsers parsers in in low low complexity, complexity, rest rest complexity complexity parsers parsers C-compiler. C-compiler. table table both both than than zero. zero. is is incremental incremental which which input, input, and and LALR(l) LALR(l) is is of of that that time time workload workload experiments experiments that that Only Only language language we we of of had had an an by by PG PG be be the the deleted deleted in in [Cha86]. [Cha86]. to: to: modification modification is is The The are are SDF SDF were were which which PG PG the the easy easy PG PG was was Lisp. Lisp. language language used used a a to to code. code. and and not not the the of of parse parse namely namely much much be be parsers parsers lazy lazy generates generates and and PG PG was was tables tables used used (no (no derivative derivative as as able able parser parser the the IPG IPG the the definition definition a a from from first first explains explains generated generated rule rule and and IPG, IPG, showed showed large large LeLisp LeLisp parser parser tree tree modified modified larger larger input input swapping). swapping). algorithms algorithms for for to to are are definition definition four four and and parse parse are are within within times times in in IPG IPG genera present present but but but but regen LR(O) LR(O) as as larger larger order order of of used used gen than than gen why why SDF SDF of of sys gar that that the the ran ran did did the the for for all all of of of of in in a a a a dix dix be be SDF SDF Lang, Lang, viewpoint, viewpoint, we we incremental incremental *We *We have have 7. 7. solution solution test test used used ing ing for for tion tion parser parser context-free context-free the the We We centage centage implementation implementation Algorithm: Algorithm: Input: Input: Description: Description: DECR-REFCOUNI': DECR-REFCOUNI': RE-EXPAND: RE-EXPAND: Algorithm: Algorithm: Input: Input: Description: Description: expressed expressed PERFORMANCE PERFORMANCE PG PG did did B, B, LALR(l) LALR(l) performance. performance. ltemsets. ltemsets. itemset' itemset' grammar grammar have have made made used used we we is is is is been been Both Both The The to to A A A A generator, generator, and and not not an an its its improved improved of of for for a a set set give give set set compared compared more more LR(l) LR(l) reasonably reasonably introduction introduction by by appropriate here, here, appropriate dirty dirty we we have have IPG IPG PG PG to to this this version version of of of of parsing parsing DECR-REFCOUNI'(itemset): DECR-REFCOUNI'(itemset): RE-EXPAND(itemset): RE-EXPAND(itemset): The The itemset itemset in in parser parser had had All All both both us us which which expect expect Expand Expand efficient efficient items items items items the the have have sets sets SDF SDF and and if if EXPAND(itemset) EXPAND(itemset) for for itemset.refcount itemset.refcount od od of of old-transitions old-transitions problem problem access access would would version version reference reference reference reference to to but but itemset.refcount itemset.refcount sharing sharing an an garbage garbage ltemsets ltemsets if if DECR-REFCOUNI'(itemset') DECR-REFCOUNI'(itemset') Vitemset' Vitemset' described described the the generator generator of of is is be be algorithm*. algorithm*. IPG IPG itemset itemset itemset itemset itemset itemset itself itself to to example example style style sized sized itemset.type itemset.type Earley's Earley's as as expanded expanded a a items items AND AND of of LR(l), LR(l), od od for for to to be be not not efficiency efficiency dirty dirty of of of of the the would would generate generate dirty dirty of of a a expressed expressed parse parse to to DECR-REFCOUNI'(itemset') DECR-REFCOUNI'(itemset') counts counts Vitemset' Vitemset' count count grammar. grammar. be be the the good good collection collection Lisp Lisp because because whose whose with with is is referred := := becomes becomes EFFICIENCY EFFICIENCY modifications modifications be be set set [(symbol [(symbol Yacc Yacc in in of of since since algorithm algorithm a a ltemsets ltemsets := := sets sets grammar grammar trees. trees. be be programming programming acceptable acceptable := := in in fair fair section section of of an an As As type type implementation implementation of of itemset.transitions itemset.transitions * * of of of of = = parse parse to to ref ref the the itemset.refcount itemset.refcount items items of of [Joh79]. [Joh79]. SDF SDF itemset itemset initial initial these these we we 0 0 the the both both match, match, is is [(symbol [(symbol the the use use count count to to 'dirty'. 'dirty'. The The items items cannot cannot then then same same itemsel) itemsel) decreased decreased purely purely - { { - wanted wanted 4 4 high. high. set set of of to to definition definition lazy lazy tables tables a a systems systems are are (which (which then then fact fact itemset} itemset} in in than than to to the the conventional conventional have have of of field field is is we we and and way way A A the the yet yet PG PG and and coincidental. coincidental. the the decreased. decreased. itemsel) itemsel) items items syntax syntax that that Tomita Tomita comparison comparison have have (or (or to to e e reference reference has has better better as as of of by by only only we we and and handle handle routines routines recognize recognize incremental incremental old-transitions] old-transitions] test test and and - it it graphs graphs an an the the it it one one to to will will not not l l did did also happens also happens refers refers IPG. IPG. formalism formalism definition grammars grammars all all initial initial an an be be generation generation e e algorithm, algorithm, after after in in mark-and-sweep mark-and-sweep included included circular circular call call When When itemset.transitions] itemset.transitions] decreased decreased algorithms algorithms idea idea counting counting his his are are of of of of The The It It to, to, the the set set book book the the 'PG'). 'PG'). only only IPG IPG itemsets) itemsets) parser parser of of quite quite have have same same it it SDF SDF of of accepted accepted expansion. expansion. the the references references [fom85], [fom85], such such do do performance, performance, becomes becomes and and means means to to with with items, items, more more by by to to small, small, size size on on We We definition definition generator generator be be class class a a be be one. one. a a Earley's Earley's the the that that quick quick the the of of or or also also comparison. comparison. only only garbage garbage decreased decreased that that and, and, by by SDF. SDF. of of we we zero, zero, the the less less same same language language properly. properly. are are Yacc. Yacc. do do compared compared after after context-free context-free the the the the and and will will IPG IPG of of test test but but affects affects parsing parsing The The inteipreted inteipreted itemset itemset grammar grammar grammar grammar a a SDF SDF collector collector reference reference mediocre mediocre suggestion suggestion grammar. grammar. a a not not The The with with as as reason reason much much in in From From well. well. is is A A all all show show IPG IPG which which test test algorithm algorithm is is that that given given straightforward straightforward grammars. grammars. routines routines removed removed of of when when and and inferior inferior count count of of grammar grammar implementa for for by by a a and and of of SDF SDF them. them. B. B. theoretical theoretical grammars grammars input, input, in in Tomita's Tomita's choosing choosing the the PG PG the the of of appen would would has has of of pars from from non each each with with per Our Our the the the the we we As As 17 17 to to 19 YACC: :::t1 : ll: ll~ ll~ CPU time (in seconds) PG: CPU time (in seconds) IPG: CPU time (in seconds) exp.sdf Exam.sdf SDF.sdf ASF.sdf (37 tokens) (166 tokens) (342 tokens) (475 tokens) Fig. 7.1: Results of efficiency measurements on Yacc, PG and IPG. conventional LR(O) parser generator. As is shown by the measurements in section 7, IPG is an efficient parser generator suitable for use in interactive language definition systems. Future work: related to IPG will include: • Simultaneous editing of language definitions and programs As has been explained in the introduction, we currently have an operational prototype of a univer sal syntax-directed editor parametrized with a syntax definition written in SDF. It is our aim to allow simultaneous editing of both this syntax definition as well as the program/specification writ ten in the language defined by it. • Syntax-directed editing of programs/specifications defining their own syntax. An extreme case of simultaneously editing and using syntax definitions occurs when a language can modify its own syntax. In this case, modification and use of the syntax occur in the same tex tual object to be edited. Limited forms of user-defined syntax appear in various disguises such as operator declarations, macro's and user-defined function notation. Oearly, the modification capa bility of IPG can be used to implement these changes in syntax. • MO [Hor88] [Hor88] [Cha86] [Cha86] [Lan74] [Lan74] [Joh79] [Joh79] [Tom85] [Tom85] [Ear70] [Ear70] [Rek87] [Rek87] [ASU86] [ASU86] [HKR87b] [HKR87b] [Log88] [Log88] [HKR87a] [HKR87a] [Hen87] [Hen87] [HK88] [HK88] [FGJM85] [FGJM85] [Voi86] [Voi86] more more [San82] [San82] REFERENCES REFERENCES grammars grammars of of modification modification LR(O) LR(O) LR LR While While incremental incremental POSTSCRIPT POSTSCRIPT 20 20 some some parsers parsers efficient efficient As As Horspool's Horspool's we we tables tables believe, believe, loss loss '' '' a a in in were were [Hor88]. [Hor88]. consequence, consequence, generation generation any any (1986). (1986). J. J. turns turns S.C. S.C. B. B. 2B, 2B, pp. pp. Proceedings Proceedings Victoria, Victoria, J. J. Loeckx, Loeckx, R.N. R.N. Conference Conference M. M. (1987). (1987). J. J. Netherlands, Netherlands, in in Report Report Mathematics Mathematics J. J. M.H. M.H. definition definition Addison-Wesley Addison-Wesley J. J. A.V. A.V. Computer Computer CS-R8761, CS-R8761, P.R.H. P.R.H. K. K. Science Science J. J. F. F. 66 66 algebraic algebraic ming ming Languages, Languages, D. D. Conference Conference LALR(l) LALR(l) as as Earley, Earley, Heering, Heering, Rekers, Rekers, Chailloux, Chailloux, Heering, Heering, parsing parsing finishing finishing Lang, Lang, Voisin, Voisin, Heering Heering Futatsugi, Futatsugi, Sandberg, Sandberg, Tomita, Tomita, in in 94-102 94-102 is is Bell Bell way). way). point point look-ahead look-ahead Johnson, Johnson, Horspool, Horspool, out out Languages, Languages, Logger, Logger, Conference Conference not not Aho, Aho, CS-R8749, CS-R8749, Hendriks, Hendriks, As As of of Laboratories Laboratories Lecture Lecture to to Victoria, Victoria, "Deterministic "Deterministic satisfactory. satisfactory. specification," specification," "An "An of of Science, Science, efficiency efficiency and and ''A ''A Computer Computer of of parsers. parsers. ''CIGALE: ''CIGALE: (1970). (1970). his his P. P. be be Efficient Efficient Centre Centre P. P. ACM, ACM, this this and and his his R. R. Record Record Proceedings Proceedings departure departure of of SION, SION, and and et et J.A. J.A. "LITHE: "LITHE: Klint, Klint, LALR(l) LALR(l) "An "An problematic. problematic. parser parser overall overall "YACC: "YACC: its its Klint, Klint, efficient efficient system system the the Sethi, Sethi, "Incremental "Incremental al., al., paper, paper, Notes Notes sets sets (1986). (1986). P. P. Computer Computer ACM, ACM, Record Record Goguen, Goguen, ''Type-checking ''Type-checking B.C., B.C., compilation compilation for for Centre Centre Albuquerque, Albuquerque, Amsterdam Amsterdam integrated integrated We We Second Second Le_Lisp, Le_Lisp, Amsterdam Amsterdam parsing parsing and and Klint, Klint, of of for for Programming Programming generator generator (1979). (1979). and and have have goals goals Mathematics Mathematics is is in in a a and and R.N. R.N. has has opted opted the the A A context-free context-free Canada Canada parse parse New New J. J. non-LR(O) non-LR(O) yet yet tool tool a a techniques techniques pp. pp. Computer Computer for for of of J. J. language language of of Rekers, Rekers, conventional conventional J.-P. J.-P. "Towards "Towards a a Science, Science, Ninth Ninth are are Colloquium Colloquium J.D. J.D. to to for for the the another another Rekers, Rekers, Horspool Horspool 21-38 21-38 Mathematics Mathematics Version Version for for less less Orleans Orleans Computing Computing for for text text generation generation tables. tables. to to be be (1988). (1988). very very for for natural natural (1988). (1988). Twelfth Twelfth (1987). (1987). Jouannaud, Jouannaud, New New interactive interactive a a lillman, lillman, efficient efficient Prolog," Prolog," Annual Annual arid arid "Principles "Principles more more taken taken finitely finitely 1, 1, and and mini-ML: mini-ML: languages languages combining combining in in Science Science parsing parsing similar similar for for Amsterdam Amsterdam compiler-compiler," compiler-compiler," "Incremental "Incremental (1985). (1985). pp. pp. 15.2, 15.2, Mexico Mexico This This Conference Conference syntax-directed syntax-directed sent sent shorter shorter Computer Computer languages languages LR LR efficient efficient Annual Annual on on efficient efficient of of 61-86, 61-86, into into Science Science ACM ACM Compilers. Compilers. ambiguous ambiguous and and incremental incremental us us Automata, Automata, Report Report is is parser parser to to le le 14, 14, and and LR LR algorithm," algorithm," grammar grammar an an (1982). (1982). (but (but his his more more ours, ours, manuel manuel oflazy oflazy account, account, Computer Computer a a algebraic algebraic Springer-Verlag, Springer-Verlag, ACM ACM parsers," parsers," J. J. North-Holland North-Holland flexible flexible (1988). (1988). Symposium Symposium non-detenninistic non-detenninistic LR(O) LR(O) experience experience Science, Science, recent recent , , rather rather Proceedings Proceedings Meseguer, Meseguer, without without in in Kluwer Kluwer CS-R88 CS-R88 generation generation we we difficult difficult Symposium Symposium construction construction and and editor," editor," context-free context-free the the de de Principles, Principles, Languages Languages briefly briefly table table table table whose whose syntax syntax Communications Communications report report than than Science, Science, specifications: specifications: reference reference incremental incremental Report Report Netherlands, Netherlands, Amsterdam Amsterdam restricting restricting in in Academic Academic l l 4, 4, generation generation on on with with generation generation than than "Principles "Principles a a summarize summarize UNIX UNIX Report Report of of Centre Centre (1986). (1986). on on parallel parallel of of and and incremental incremental Principles Principles Saarbriicken Saarbriicken DCS- on on lexical lexical Amsterdam Amsterdam Computing Computing user-defined user-defined incremental incremental and and incremental incremental Techniques Techniques grammars,'' grammars,'' and and parsers," parsers," classes," classes," , , Programmer's Programmer's Principles Principles INRIA, INRIA, the the Publishers Publishers CS-R8820, CS-R8820, program program (1987). (1987). for for Programming, Programming, expression expression 79-IR, 79-IR, one one phase, phase, phase phase SION, SION, a a class class scanners," scanners," his his of of of of Mathematics Mathematics simple simple of of and and the the OBJ2," OBJ2," pp. pp. generation generation approach. approach. pp. pp. (1974). (1974). Science Science (1987). (1987). Rocquencourt Rocquencourt at at Programming Programming University University generation generation but but of of generation generation and and syntax syntax generation," generation," pp. pp. of of ACM ACM he he Amsterdam Amsterdam ( ( the the 255-269 255-269 142-145 142-145 1985). 1985). Centre Centre acceptable acceptable Program parsing,'' parsing,'' generates generates considers considers language language 69-86 69-86 Manual Manual expense expense Tools Tools pp. pp. Report Report 13(2), 13(2), ed. ed. in in in in and and 52- and and the the for for an an of of in in in in J. J. of of of of in in , , 21 APPENDIX A. GOTO is always called with a complete set of items LR-PARSE calls GOTO for a certain set of items which we will prove to be complete. We do this for LR p ARSE as given below, which is just another representation of the algorithm given in section 3.1. (1) LR-PARSE(start-state, sentence): (2) parser := new(LRparser) (3) push(start-state, parser.stack) (4) symbol, sentence:= head(sentence), tail(sentence) (5) while true do (6) actions :=ACTION(top(parser.stack), symbol) (7) if 3action e actions then (8) if action =(shift state') then (9) push(state', parser.stack) (10) symbol, sentence:= head(sentence), tail(sentence) (11) elseif action= (reduce A ::= fi) then (12) for 1 · · · length((i) do pop(parser.stack) od (13) state" := GOTO(top(parser.stack), A) (14) push(state", parser.stack) (15) elseif action =(accept) then (16) return true (17) fi (18) else (19) return false (20) fi (21) od We first need to prove a loop invariant for the algorithm that states that only the set of items on top of the stack can be initial, all others must be complete: INV: "ifi[l ~i~ length(parser.stack)- l ~ parser.stack [i ].type =complete] INV is true before entering the main loop of the algorithm, because the stack then only consists of one ele ment. The execution of the loop itself does not alter INV. After executing line (6), where ACTION was called for the state on top of the stack, an even stronger condition holds: if ACTION is called on an initial set of items, it expands it to a complete set of items. So after the call of ACTION we have: S-INV: "ifi[l ~i~length(parser.stack) ~ parser.stack[i].type =complete] ACTION returns zero or more actions, of which one is chosen. The following cases are possible: • actions =0 The execution of line (19) does not alter the stack. We bad S-INV, so we certainly have INV. • action = (shift state') In line (9) state' is pushed on the stack. The length of the stack is thus increased by one, but because we had S-INV, we still have INV. • action =(reduce A ::= fi) In line (12) the stack is reduced by length((i) elements. Because we bad S-INV and the length of the stack gets smaller or remains equal by this operation, S-INV still holds after line (12). In line (14) state" is pushed on the stack, which increases the length of the stack by one. Because we bad S-INV, we still have INV. • action =(accept) The execution of line (16) does not alter the stack We had S-INV, so we certainly have INV. INV is indeed a loop invariant of LR-PARSE. GOTO is called for the state on top of the parse stack in line (13); after line (12) we had S-INV, so we have also top(parser.stack).type =complete. " This proof can be extended to the case of PAR-PARSE in a straightforward way. Here, the invariant to prove would be that INV holds for all parsers in this-sweep u next-sweep at each moment during the parsing. We will not elaborate this proof. begin begin module module 13 13 part, part, tance, tance, context-free context-free SDF SDF Formalism' Formalism' APPENDIX APPENDIX 22 22 ~A ~A context-free context-free followed followed is is lexical lexical In In because because is is the the the the equivalent equivalent SDF SDF functions functions sorts sorts functions functions layout layout sorts sorts language language and and context-free context-free syntax. syntax. B. B. by by we we LEXICAL-FUNCTIONS, LEXICAL-FUNCTIONS, FUNCTIONS, FUNCTIONS, "end" "end" CONTEXT-FREE-SYNTAX, CONTEXT-FREE-SYNTAX, SDF-DEFINITION, SDF-DEFINITION, "begin" "begin" "module" "module" "--" "--" ORD-CHAR ORD-CHAR "-" "-" ORD-CHAR ORD-CHAR C-CHAR C-CHAR "--" "--" "\"" "\"" C-CHAR C-CHAR "\n" "\n" "-\n" "-\n" - "\"" "\"" LETTER LETTER "[" "[" WHITE-SPACE, WHITE-SPACE, "\ "\ L-CHAR, L-CHAR, ORD-CHAR, ORD-CHAR, "+" "+" "*" "*" LETTER, LETTER, [ [ [\-\[\]] [\-\[\]] syntax syntax [0-9A-Za-z [0-9A-Za-z [a-zA-Z0-9\-_] [a-zA-Z0-9\-_] [a-zA-Z] [a-zA-Z] is is The The the the \t\n\r\f] \t\n\r\f] \" \" did did [\n\-] [\n\-] described described to to CONTEXT-FREE-SYNTAX CONTEXT-FREE-SYNTAX LEXICAL-SYNTAX LEXICAL-SYNTAX CHAR-RANGE* CHAR-RANGE* - in in For For SDF SDF declaration declaration COM-CHAR* COM-CHAR* syntax syntax L-CHAR* L-CHAR* not not - a a ID ID which which [\n\-] [\n\-] BNF BNF "-" "-" ID-TAIL* ID-TAIL* The The [] [] syntax syntax the the use use LITERAL, LITERAL, ID-TAIL, ID-TAIL, definition definition ID ID in in C-CHAR, C-CHAR, measurements measurements FUNCTION-DEF, FUNCTION-DEF, syntax syntax the the C-CHAR C-CHAR !J$%&' !J$%&' grammar grammar SDF SDF [HK.86]. [HK.86]. -- section section "\"" "\"" COMMENT COMMENT lexical lexical of of COM-END COM-END all all definition definition LEXICAL-SYNTAX, LEXICAL-SYNTAX, the the "]" "]" rule rule of of COM-CHAR, COM-CHAR, ID, ID, ()*+,./:;<=>?@\~ ()*+,./:;<=>?@\~ CHAR-RANGE, CHAR-RANGE, SDF SDF definitions definitions syntax syntax LEXICAL-FUNCTION-DEF, LEXICAL-FUNCTION-DEF, the the chars chars scanner scanner An An A A PRIORITIES, PRIORITIES, ITERATOR, ITERATOR, ::= ::= non-terminals non-terminals described described SDF SDF 13. 13. rules rules CF-ELEM, CF-ELEM, except except in in definition definition of of for for COM-END COM-END the the in in SDF SDF in in IPG IPG measurements. measurements. CHAR-CLASS, CHAR-CLASS, the the section section SORTS-DECL, SORTS-DECL, control control used used PRIO-DEF, PRIO-DEF, are are consists consists 'functions' 'functions' ATTRIBUTES, ATTRIBUTES, '{I}-] '{I}-] written. written. are are 7 7 the the declared declared of of characters, characters, LEX-ELEM, LEX-ELEM, two two declaration declaration lexical lexical ABBREV-F-LIST, ABBREV-F-LIST, SDF SDF -> -> SORT, SORT, -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> -> ->ORD-CHAR ->ORD-CHAR -> -> -> -> -> -> -> -> parts, parts, ATTRIBUTE ATTRIBUTE first first SDF-DEFINITION SDF-DEFINITION COM-END COM-END COMMENT COMMENT COM-END COM-END COM-END COM-END WHITE-SPACE WHITE-SPACE COM-CHAR COM-CHAR LITERAL LITERAL COM-CHAR COM-CHAR stands stands L-CHAR L-CHAR L-CHAR L-CHAR C-CHAR C-CHAR CHAR-CLASS CHAR-CLASS C-CHAR C-CHAR CHAR-RANGE CHAR-RANGE ORD-CHAR ORD-CHAR CHAR-RANGE CHAR-RANGE LETTER LETTER ID ID ID-TAIL ID-TAIL ITERATOR ITERATOR ITERATOR ITERATOR syntax syntax the the LAYOUT, LAYOUT, in in part. part. the the ", ", for for lexical lexical part part 'Syntax 'Syntax 'sorts' 'sorts' -, -, An An is is ABBREV-F-DEF, ABBREV-F-DEF, syntax syntax [, [, SDF SDF of of declaration declaration no no ] ] Definition Definition function function and\ and\ and and impor the the 23 "lexical" "syntax" SORTS-DECL LAYOUT LEXICAL-FUNCTIONS -> LEXICAL-SYNTAX empty -- -> LEXICAL-SYNTAX "sorts" {SORT","}+ -> SORTS-DECL empty -- -> SORTS-DECL ID -> SORT "layout" {SORT","}+ -> LAYOUT -- empty -> LAYOUT "functions" LEXICAL-FUNCTION-DEF+ -> LEXICAL-FUNCTIONS LEX-ELEM+ "->" SORT -> LEXICAL-FUNCTION-DEF SORT -> LEX-ELEM SORT ITERATOR -> LEX-ELEM LITERAL -> LEX-ELEM CHAR-CLASS -> LEX-ELEM 11 11 - CHAR-CLASS -> LEX-ELEM "context-free" "syntax" SORTS-DECL PRIORITIES FUNCTIONS -> CONTEXT-FREE-SYNTAX "priorities" {PRIO-DEF ","}+ -> PRIORITIES -- empty -- -> PRIORITIES {ABBREV-F-LIST ">"}+ -> PRIO-DEF {ABBREV-F-LIST "<"}+ -> PRIO-DEF ABBREV-F-DEF -> ABBREV-F-LIST "(" {ABBREV-F-DEF ","}+ ")" -> ABBREV-F-LIST CF-ELEM+ -> ABBREV-F-DEF CF-ELEM* "->" SORT -> ABBREV-F-DEF "functions" FUNCTION-DEF+ -> FUNCTIONS CF-ELEM* "->" SORT ATTRIBUTES -> FUNCTION-DEF SORT -> CF-ELEM LITERAL -> CF-ELEM SORT ITERATOR -> CF-ELEM "{" SORT LITERAL "}" ITERATOR -> CF-ELEM "{"{ATTRIBUTE","}+"}" -> ATTRIBUTES -- empty -> ATTRIBUTES "par" -> ATTRIBUTE "assoc" -> ATTRIBUTE "left-assoc" -> ATTRIBUTE "right-assoc" -> ATTRIBUTE end SDF