Deterministic of Languages



with Dynamic Op erators

Kjell Post Allen Van Gelder James Kerr

Baskin Computer Science Center

University of California

Santa Cruz, CA 95064

email: [email protected]

UCSC-CRL-93-15

July 27, 1993

Abstract

Allowing the programmer to de ne op erators in a language makes for more readable co de but also complicates

the job of parsing; standard parsing techniques cannot accommo date dynamic grammars. We presentan

LR parsing metho dology, called deferreddecision parsing , that handles dynamic op erator declarations, that

is, op erators that are declared at run time, are applicable only within a program or context, and are not in

the underlying language or grammar. It uses a parser generator that takes pro duction rules as input, and

generates a table-driven LR parser, much like yacc. Shift/reduce con icts that involve dynamic op erators

are resolved at parse time rather than at table construction time.

For an op erator-rich language, this technique reduces the size of the grammar needed and parse table

pro duced. The added cost to the parser is minimal. Ambiguous op erator constructs can either b e detected

by the parser as input is b eing read or avoided altogether by enforcing reasonable restrictions on op erator

declarations. Wehave b een able to describ e the syntax of Prolog, a language known for its lib eral use of

op erators, and Standard ML, which supp orts lo cal declarations of op erators.

De nite clause grammars DCGs, a novel parsing feature of Prolog, can b e translated into ecient

co de by our parser generator. The implementation has the advantage that the token stream need not

b e completely acquired b eforehand, and the parsing, b eing deterministic, is approximately linear time.

Conventional implementations based on parsers can require exp onential time.

Categories and Sub ject Descriptors: D.3.1 [Programming Languages]: Formal De nitions and Theory|

syntax ; D.3.2 [Programming Languages]: Language Classi cations|applicative languages; extensible

languages ; D.3.4 [Programming Languages]: Pro cessors|translator writing systems and

generators; parsing

General Terms: Grammars, Parsers, Op erators

Additional Key Words and Phrases: Parser Generators, Ambiguous Grammars, LR Grammars, Attribute

Grammars, Context-Free Languages, Logic Programming



Originally app eared in \Logic Programming | Pro ceedings of the 1993 International Symp osium ILPS93" 1

1 Intro duction and Background

Syntax often has a profound e ect on the usability of a language. Although b oth Prolog and Lisp are

conceptually ro oted in recursive functions, Prolog's op erator-based syntax is generally p erceived as b eing

easier to read than Lisp's pre x notation. Prolog intro duced two innovations in the use of op erators:

1

1. Users could de ne new op erators at run time with great exibility .

2. Expressions using op erators were semantically equivalent to pre x form expressions; they were a

convenience rather than a necessity.

Notably, new functional programming languages, suchasML[21] and Haskell [14], are following Prolog's

lead in the use of op erators. Prop er use of op erators can make a program easier to read but also complicates

the job of parsing the language.

Example 1.1: The following Prolog rules make extensive use of op erators to improve readability.

X requires Y :- X calls Y.

X requires Y :- X calls Z, Z requires Y.

Here \:-" and \," are supplied as standard op erators, while \requires" and \calls"have programmer

de nitions not shown.

Languages that supp ort dynamic op erators have some or all of the following syntactic features:

1. The name of an op erator can b e any legal identi er or symb ol. Perhaps, even \built-in" op erators like

\+", \", and \" can b e rede ned.

2. The syntactic prop erties of an op erator can b e changed by the user during parsing of input.

3. Op erators can act as arguments to other op erators. For example, if \_", \^", and \" are in x

op erators, the sentence \_ ^"may b e prop er.

4. Op erators can b e overloaded , in that a single identi er can b e used as the name of two or more

syntactically di erent op erators. A familiar example is \", which can b e pre x or in x.

This pap er addresses problems in parsing such languages, presents a solution based on a generalization of

LR parsing and describ es a parser generator for this family of languages. After reviewing some de nitions,

we give some background on the problem, and then summarize this pap er's contributions. Later sections

discuss several of the issues in detail.

1.1 De nitions

We brie y review some standard de nitions concerning op erators, and sp ecify particular terminology used

in the pap er. An operator is normally a unary or binary function whose identi er can app ear b efore, after,

or b etween its arguments, dep ending on whether its xity is pre x , post x ,orin x . An identi er that has

b een declared to b e op erators of di erent xities is said to b e an overloadedoperator .We shall also have

o ccasion to consider nul lary op erators, which take no arguments. More general op erator notations, some

of which take more than two arguments, are not considered here. Besides xity , op erators havetwo other

prop erties, precedence and associativity , whichgovern their use in the language.

An op erator's precedence ,orscope , is represented by a p ositiveinteger. This rep ort uses the Prolog

convention, which is that the larger precedence numb er means the wider \scop e" and the weaker binding

strength. This is the reverse of many languages, hence the synonym scope serves as a reminder. Thus \+"

normally has a larger precedence numb er than \*"by this convention.

1

Some earlier languages p ermitted very limited de nitions of new op erators; see section 1.2 2

An op erator's associativity is one of left , right ,ornon .For our purp oses, an expression is a term whose

principal function is an op erator. The precedence of an expression is that of its principal op erator. A

left right asso ciative op erator is p ermitted to have an expression of equal precedence as its left right

argument. Otherwise, arguments of op erators must havelower precedence rememb ering Prolog's order.

Non-expression terms have precedence 0; this includes parenthesized expressions, if they are de ned in the

grammar.

We assume in the rest of this pap er some acquaintance with Prolog [5, 6]. A basic knowledge of LR

parsing [1, 3] and the parser generator yacc [15] is also helpful.

1.2 Background and Prior Work

Most parsing techniques assume a static grammar and xed op erator priorities. Excellent metho ds have b een

develop ed for generating ecient LL parsers and LR parsers from sp eci cations in the form of pro duction

rules, sometimes augmented with asso ciativity and precedence declarations for in x op erators [1, 3, 4, 9, 15].

Parser generation metho ds enjoy several signi cant advantages over \hand co ding":

1. The language syntax can b e presented in a readable, nonpro cedural form, similar to pro duction rules.

2. Emb edded semantic actions can b e triggered by parsing situations.

3. For most programming languages, the tokenizer may b e separated from the grammar, b oth in co de

and in sp eci cation.

4. Parsing runs in linear time and normally uses sublinear stack space.

The price that is paid is that only the class of LR1 languages can b e treated, but this is normally a

suciently large class in practice.

Earley's [8] is a general context-free parser. It can handle any grammar but is more exp ensive

than the LR-style techniques b ecause the con guration states of the LR0 automaton are computed and

3

manipulated as the parse pro ceeds. Parsing an input string of length n may require O n  time although

2

LRk grammars only take linear time, O n  space, and an input-bu er of size n.Tomita's algorithm

[27] improves on Earley's algorithm by precompiling the grammar into a parse table, p ossibly with multiple

entries. Still, the language is xed during the parse and it would not b e p ossible to intro duce or change

prop erties of op erators on the y.

Incremental parser generators [11, 13] can b e viewed as an application of Tomita's parsing metho d. They

can handle mo di cations to the input grammar at the exp ense of recomputing parts of the parse table and

having the LR0 automaton available at run time. Garbage collection also b ecomes an issue.

To our knowledge these metho ds have never b een applied to parse languages with dynamically declared

op erators.

Op erator precedence parsing is another metho d with applications to dynamic op erators [18] but it can

not handle overloaded op erators.

Permitting user-de ned op erators was part of the design of several early pro cedural languages, suchas

Algol 68 [28] and EL1 [12], but these designs avoided most of the technical diculties by placing severe

restrictions on the de nable op erators. First, in x op erators were limited to 7 to 10 precedence levels. By

comparison, C has 15 built-in precedence levels, and Prolog p ermits 1200. More signi cantly, pre x op erators

always to ok precedence over in x, preventing certain combinations from b eing interpreted naturally.C

de nes some in x to take precedence over some pre x, and Prolog p ermits this in user de nitions. For

example, no declarations in the EL1 or Algol 68 framework p ermit not X=Y to b e parsed as not X=Y.

Scant details of the parsing metho ds can b e found in the literature, but it app ears that one implementation

of EL1 used LR parsing with a mechanism for changing the token of an identi er that had b een dynamically

declared as an op erator \based on lo cal context" b efore the token reached the parser [12]. Our approach

generalizes this technique, p ostp oning the decision until after the parser has seen the token, and even until

after additional tokens have b een read. 3

op decls

sentences

sentences

standard deferred

op decls

parser parser rules parser parser

rules

generator generator

parse tree

Figure 1: Standard Parser Generator and Deferred Decision Parser Generator.

Languages have b een designed to allow other forms of user-de ned syntax b esides unary and binary

op erators. Among them, EL1 included \bracket- x" op erators and other dynamic syntax; in some cases

a new parser would b e generated [12]. More recently,Peyton Jones [16] describ es a technique for parsing

programs that involves user-de ned dist x op erators e.g. if-then-else-fi, but without supp ort for

precedence and asso ciativity declarations.

With the advent of languages that p ermit dynamic op erators, numerous ad hoc parsers have b een

develop ed. The tokenizer is often emb edded in these parsers with attendant complications. It is frequently

dicult to tell precisely what language they parse; the parser co de literally de nes the language.

Example 1.2: Using standard op erator precedence s, pre x \" binds less tightly than in x \" in p opular

versions of Prolog. However, - 3 * 4 was found to parse as -3 * 4 whereas -X*4was found

to parse as - X * 4. Numerous other anomalies can b e demonstrated.

Indeed, written descriptions of the \Edinburgh syntax" for Prolog are acknowledged to b e approximations,

and the \ultimate de nition" seems to b e the program co de, read.pl.

1.3 Summary of Contributions

A metho d called deferreddecision parsing has b een develop ed with the ob jective of building up on the

techniques of LR parsing and parser generation, and enjoying their advantages mentioned ab ove, while

extending the metho dology to incorp orate dynamic op erators. The metho d is an extension of earlier work

by Kerr [17]. It supp orts all four features that were listed ab ove as b eing needed by languages with dynamic

op erators. The resulting parsers are deterministic, and su er only a small time p enalty when compared to

LR parsing without dynamic op erators. They use substantially less time and space than Earley's algorithm.

In \standard" LR parser generation, as done by yacc and similar to ols, shift/reduce con icts, as

evidenced bymultiple entries in the initial parse table, are resolved by consulting declarations concerning

static op erator precedence and asso ciativity [15]. If the pro cess is successful, the nal parse table b ecomes

deterministic.

The main idea of deferred decision parsing is that shift/reduce con icts involving dynamic op erators

are left unresolved when the parser is generated. At run time the current declarations concerning op erator

precedence and asso ciativity are consulted to resolveanambiguity when it actually arises  g 1. Another

imp ortant extension is the ability to handle a wider range of xities, as well as overloading, compared to

standard LR parser generators. These extensions are needed to parse several newer languages.

The parser generator, implemented in a standard dialect of Prolog, pro cesses DCG-style input consisting

of pro duction rules, normally with emb edded semantic actions. The output is Prolog source co de for the

parser, with appropriate calls to a scanner and an op erator mo dule.

The op erator mo dule provides a standard interface to the dynamic op erator table used by the parser.

Thus, the language designer decides what language constructs denote op erator declarations; when such 4

constructs are recognized during parsing, the asso ciated semantic actions mayinteract with the op erator

mo dule. Pro cedures are provided to query the state of the op erator table and to up date it. Interaction with

the op erator mo dule is shown in the example in g 4.

We assume that the grammar is designed in suchaway that semantic actions that up date dynamic

op erators cannot o ccur while a dynamic op erator is in the parser stack; otherwise a grammatical monstrosity

might b e created. In other words, it should b e imp ossible for a dynamic op erator to app ear in a parse

tree as an ancestor of a symb ol whose semantic action can change the op erator table. Such designs are

straightforward and natural, so we see no need for a sp ecial mechanism to enforce this constraint. For

example, if dynamic op erator prop erties can b e changed up on reducing a \sentence", then the grammar

should not p ermit dynamic op erators b etween \sentences", for that would place a dynamic op erator ab ove

\sentence" in some parse trees.

Standard op erator-con ict situations are easily handled by a run-time resolution mo dule. However, Prolog

o ers very general op erator-declaration capabilities, which in theory can intro duce signi cant diculties with

overloaded op erators. Actually, these complicated situations rarely arise in practice, as users refrain from

de ning an op erator structure that they cannot understand themselves. In Edinburgh Prolog [5] for instance,

the user can de ne a symb ol as an op erator of all three xities and then use the symb ol as an op erator,

or as an op erand nullary op erator, or as the functor of a comp ound term. As the Prolog standardization

committee recognized in a 1990 rep ort [25],

\These p ossibilities provide multiple opp ortunities for ambiguity and the draft standard has

therefore de ned restrictions on the syntax so that a expressions can still b e parsed from left to

right without needing signi cant backtracking or lo ok-ahead:::"

A preliminary rep ort on this work [24] showed that many of the prop osed restrictions were unnecessary. The

ISO committee has subsequently relaxed most of the restrictions, but at the exp ense of more complex rules

for terms [26].

The mo dular design of our system p ermits di erent con ict resolution strategies and op erator-restriction

p olicies to b e \plugged in", and thus may serveasavaluable to ol for investigating languages that are

well-de ned, yet havevery exible op erator capabilities.

So-called de nite clause grammars DCGs are a syntactic variant of Prolog, in which statements are

written in a pro duction-rule style, with emb edded semantic actions [5]. This style p ermits input-driven

programs to b e develop ed quickly and concisely. Our parser generator provides a foundation for DCG

compilation that overcomes some of the de ciencies of existing implementations. These de ciencies include

the need to have acquired the entire input string b efore parsing b egins, and the fact that backtracking o ccurs,

even in deterministic grammars.

Our p oint of view is to regard the DCG as a translation scheme in which the arguments of predicates

app earing as nonterminals in the DCG are attributes; semantic actions may manipulate these attributes.

Essentially, a parser is a DCG in which all attributes are synthesized, and each nonterminal has a parse tree

as its only attribute. Synthesis of attributes is accomplished naturally in LR parsing, as semantic actions are

executed after the asso ciated pro duction has b een reduced. The thrust of our research has b een to correctly

handle inherited attributes. This work is discussed in Section 5.

2 Deferred Decision Parsing

The parser generator yacc disambiguates con icts in grammars by consulting programmer-supplied

information ab out precedence and asso ciativity of certain tokens, which normally function as in x op erators.

Deferred decision parsing p ostp ones the resolution of con icts involving dynamic op erators until run time.

As a running example we will use a grammar for a subset of Prolog terms with op erators of all three

xities. At run time the name of each op erator, together with its precedence, xity, and asso ciativity,is

stored in the current op erator table e.g. g. 2. The parser has access to the current op erator table and is

resp onsible for converting tokens from the scanner so that an identi er with an entry in the current op erator

table is translated to the appropriate dynamic op erator token. 5

Pre x In x Post x

Name Prec Assoc Prec Assoc Prec Assoc

+ 300 right 500 left

300 right 500 left

 400 left

= 400 left

! 300 left

Figure 2: An Example Run-Time Op erator Table.

termT ::= atomT.

termT ::= varT.

termT ::= '', termT, ''.

termT ::= opName, termT1, { T =.. [Name,T1] }.  prefix

termT ::= termT1, opName, termT2, { T =.. [Name,T1,T2]}.  infix

termT ::= termT1, opName, { T =.. [Name,T1] }.  postfix

termT ::= opT.  nullary

Figure 3: Subset Grammar for Prolog Terms.

The token names for dynamic op erators, which should not b e confused with the op erator names, are

declared to the parser generator cf. example in section 3, and app ear in the pro duction rules of the

grammar. When a pro duction rule contains a dynamic op erator token there can b e at most one grammar

symb ol on either side of the dynamic op erator, and its p osition determines the intended xity. Apart from

this, there are no other restrictions on the pro ductions. Normally a single token is sucient for all dynamic

op erators. This example uses the single token op for pre x, in x, and p ost x op erators.

Figure 3 shows the subset grammar for Prolog terms. The LR0 collection for this grammar has 11

states of which4have shift/reduce con icts.

Rather than trying to resolve each shift/reduce con ict at table construction time we will delay the

decisions and turn them into a new kind of action, called resolve , which takes two arguments: 1 the state

to enter if the con ict is resolved in favor of a shift, and 2 the rule by which to reduce if a reduction is

selected. Recall that the user language designer declares what tokens constitute dynamic op erators op

in this example. Only the con icts involving two dynamic op erators are exp ected to b e resolved at run

time. All other con icts are rep orted as usual. Con icts b etween an op erator and a non-op erator symb ol can

always b e resolved at table construction time due to the requirement that op erators have p ositive precedence

and non-expression terms have precedence 0.

The parser driver for deferred decision parsing is similar to a standard LR parser, except for one di erence:

instead of directly accessing entries in the parse table, references to parse table entries are mediated through

a pro cedure parse action S ,X , where S is the current state of the parser and X is the lo ok-ahead token. It

returns an action whichmay b e one of shift , reduce , accept ,orerror . The rule for parse action is

0

If parse table S , X =resolve S , A ! op 

A

then return do resolve A ! op , X 

A

table S , X  else return parse

The pro cedure do resolve , which is called to resolve the shift/reduce con ict, has access to the rule that

is candidate for reduction, and the lo ok-ahead token. The resolution is done by the p olicy that is used 6

at table-construction time by yacc [2, 3], with extensions to cover cases that cannot b e declared in yacc

cf. section 4. The details, for those conversant with the op eration of the LR parser, are as follows. The

shift/reduce con ict corresp onding to the resolve action requires a decision when op is on top of the

A

stack top rightmost, and the lo ok-ahead token is op , where and each consist of zero or one grammar

B

symb ols. Recall that one of the requirements for turning the con ict into a resolve action was that op and

A

op had to b e declared as dynamic op erators. The choices are to reduce , using pro duction A ! op ,

B A

or shift the lo ok-ahead token op .

B

When op erators are overloaded, there may b e several choices to consider. Even if op erators are not

overloaded by run-time declarations, there may b e the implicit overloading of the declared op erator and the

nullary op erator. The ambiguities that arise from overloading p ose serious diculties in parser design. Our

parser treats the nullary op erator as having scop e equal to the maximum of its declared scop es plus \delta".

This treatment guarantees a deterministic grammar if there is no declared overloading cf. section 4 and

retains the exibility of allowing op erators to app ear as terms.

The overloading of the op erators op and op could lead to multiple interpretations of the input string

A B

as there are several declarations to consider. In practice one do esn't have to consider all combinations

of the declarations; a Prolog grammar, for instance, do es not generate sentential forms with two adjacent

expressions, so there are only certain xity combinations worth considering:

Form of op Possible xity combinations op ;op 

A A B

6= ; 6=  in x, in x, in x, p ost x

6= ; =  in x, pre x, in x, null, p ost x, in x, p ost x, p ost x

= ; 6=  pre x, in x, pre x, p ost x

= ; =  pre x, pre x, pre x, null, null, in x, null, p ost x

Hence, for a Prolog grammar there are at most four overloading combinations. If the grammar allows

adjacent expressions there are at most 16 combinations to consider; this happ ens when b oth op erators have

declarations for all xities.

The rules b elow are evaluated for each xity combination; the resulting actions are collected into a set.

The parser will enter an error state if the set of p ossible actions is either empty { signifying a precedence

error { or contains b oth shift and reduce actions { indicating ambiguous input, in which case the two p ossible

interpretations of the input string are rep orted to the user. If there is a unique action, wehave successfully

resolved the con ict.

1. If op and op have equal scop e, then

A B

2

a If op is right-asso ciative, shift .

A

b If op is left-asso ciative, reduce.

B

2. If op is either in x or pre x with wider scop e than op , shift.

A B

3. If op is either in x or p ost x with wider scop e than op , reduce.

B A

Example 2.1: We examine the deferred decision parser as it reads -X+Y*Z!, using the op erators in g 2.

Figure 8 shows the steps involved. At step 4, state 5 contains the shift/reduce con ict fterm ! term  op+

term ; term ! op- term g for terminal op. The op erator in the redex, \-", and the lo ok-ahead op erator

\+" are overloaded as level 300 pre x right-asso ciative, level 500 in x left-asso ciative and, implicitly, as level

500 +  nullary. Due to the form of the redex, only the pre x form of \-" and in x form of \+" are considered.

Since \-" has narrower scop e than \+" and = term 6=  we satisfy the requirements for the third rule

ab ove and reduce.

2

Shift is chosen over reduce in situations where the input is 1R2 L3.R and L have the same precedence, but are declared

right-asso ciative and left-asso ciative, resp ectively. This is parsed as 1R2L3, which conform with all Prolog systems that

we are aware of, as well as the most recent ISO draft[26]. 7

At step 8, state 7 contains the shift/reduce con ict fterm ! term  op* term ; term ! term op+

term g. The op erator in the redex, \+", is overloaded as b efore while the lo ok-ahead op erator \*" is level

400 in x left-asso ciative and level 400 +  nullary. This time, the form of the redex tells us to consider the

in x versions of b oth op erators, and since in x \+" has wider scop e than in x \*"wehave a match for the

second rule and shift.

At step 11, state 7 contains the shift/reduce con ict fterm ! term  op!; term ! term op+ term g.

The op erator in the redex, \*", is still level 400 in x left-asso ciative and level 400 +  nullary while the

lo ok-ahead op erator \!" is level 300 p ost x left-asso ciative and level 300 +  nullary. The nullary versions

are not considered due to the form of the redex. Again the op erator in the redex is in x with wider scop e

than the lo ok-ahead op erator, matching the third rule, so we shift again.

3 Lo cal Op erator Declarations

The advantage of resolving op erator con icts at run time is that the programmer can change the syntactic

prop erties of op erators on the y. When parsing a language with dynamic op erators it is the resp onsibility

of the language designer to initialize the op erator table with the prede ned op erators in the language,

b efore parsing commences, and to provide semantic actions to up date the table as op erator declarations are

recognized.

Standard ML is a language with in x op erators only, but these can b e declared lo cally in blo cks, with

accompanying scop e rules [21]. Thus in the following example

let infix 5 * ; infix 4 + in

1+2*3 + let infix 3 * in 1+2*3 end + 1+2*3

end

only the middle use of * would haveanunusually low precedence, yielding the answer 23.

To understand how the op erator scop e rules of ML can b e implemented, we study the grammar subset

in gure 4.

tokenatomName, opName declares op as a dynamic op erator; if atomName is The line dynop

returned from the scanner and Name is present in the current op erator table, the token is converted to

opName.

When the parser encounters a let-expression the op erators whose names are shadowed by lo cal

declarations are saved in OldList. Each op erator declaration returns information ab out the op erator it

shadows, or null if there was no declaration with the same name in an outer scop e. The old op erators are

reinstantiated when we reach the end of the let-expressions.

Calls to the op erator table mo dule can b e seen in the Prolog rules at the b ottom: the rst two rules,

ml dcl op and ml rem op, declares and removes an op erator in the current op erator table, resp ectively, and

returns the old prop erties. The last rule, ml rest op is called to reinstantiate old op erators.

4 Ambiguities at Run Time and Induced Grammars

Because the language changes dynamically as new op erators are b eing declared, it is imp ortant to understand

exactly what language is b eing parsed. The algorithm in g. 5 constructs the induced grammar, given the

input grammar and an op erator table. As mentioned earlier, we assume that op erator declarations remain

constant while a construct involving dynamic op erators is b eing reduced. Thus it makes sense to talk ab out

the induced grammar with which that construct is parsed, even though a later part of the input maybe

parsed by a di erent induced grammar.

The induced grammar is obtained by replacing pro ductions containing dynamic op erators with a set of

pro ductions constituting a precedence grammar. It should b e p ointed out that the induced grammar is not

actually constructed by the parser. The goal of the deferred decision parser is to recognize exactly the strings

in the language generated by the induced grammar. 8

parseFile, Exp :-

openFile, read, Stream,

clear_op,  clear the operator table

init_scannerStream, InitScanState,

parseInitScanState, Exp,

closeStream.

dynop_tokenatomName, opName.

expE ::= atexplistE.

expE ::= expE1, opI, expE2, { E = appI,E1,E2 }.

atexplistE ::= atexpE.

atexplistE ::= atexplistE1, atexpE2, { E = appE1,E2 }.

atexpC ::= numberC.

atexpV ::= atomV.

atexpE ::= let, { begin_block }, dec, in, expE, end, { end_block }.

dec ::= dec, ';', dec.

dec ::= infix, numberD, idI,

{ D9 is 9-D, declare_opI, infix, left, D9 }.

dec ::= infixr, numberD, idI,

{ D9 is 9-D, declare_opI, infix, right, D9 }.

dec ::= nonfix, idI,

{ remove_opI }.

idN ::= atomN.

idN ::= opN.

 Interface to the Operator Module: declare, remove and restore operators.

ml_dcl_opName, Fix, Assoc, Prec, Old :-

query_opName, F, A, P -> Old = oldName, F, A, P ; Old = null,

declare_opName, Fix, Assoc, Prec.

ml_rem_opName, Old :-

query_opName, F, A, P -> Old = oldName, F, A, P ; Old = null,

remove_opName, infix.

ml_rest_op[].

ml_rest_op[null|T] :- ml_rest_opT.

ml_rest_op[oldName,F,A,P|T] :- declare_opName,F,A,P, ml_rest_opT.

Figure 4: Grammar Subset for ML Op erator Expressions. 9

Input: A dynamic op erator grammar and the current op erator table.

Output: The induced grammar, de ned b elow.

Metho d: Sort the op erator table by precedence so it can b e divided into k slots, where all op erators in slot

i 2 1 :::k have the same precedence p , and so that slot k holds the op erators of widest scop e. For

i

simplicity, assume that there is only one dynamic op erator, op.Now apply the following steps:

1. For each nonterminal A in the dynamic op erator grammar:

a Partition the pro ductions de ning A into ve sets corresp onding to the use of an op erator in

the right hand side pre x, in x, p ost x, op erand nullary, or no op erator at all:

 = fA ! op C g  = fA ! B op C g

p re i n

 = fA ! B opg  = fA ! opg

p ost n ull

 = fA ! ! j op 62 ! g

r em

b Build a precedence grammar consisting of k +1 layers  g.6. This will serve asaskeleton for

the induced grammar. The ith layer 0

i

corresp onds to op erators with precedence p . The 0th layer de nes the sentential forms with

i

0 precedence, provided that there are any in the dynamic op erator grammar.

c For i 2 1 :::k:

R L N

Take each op erator into accountby adding pro ductions to either A , A or A , dep ending

i i i

on the asso ciativity of the op erator b eing right, left, or none.

N

i. First the pre x op erators: If  6= ;, add pro ductions A ! oA , for all pre x,

p re i1

i

R R

non-asso ciative op erators o with precedence p , and A ! oA , for all pre x, right-

i

i i

asso ciative op erators o with precedence p .

i

L L

ii. Next the in x op erators: If  6= ;, add pro ductions A ! A oA , for all in x,

i n i1

i i

N

left-asso ciative op erators o with precedence p , and A ! A oA , for all in x,

i i1 i1

i

R R

non-asso ciative op erators o with precedence p , and A ! A oA , for all in x, right-

i i1

i i

asso ciative op erators o with precedence p .

i

L L

iii. Then the p ost x: If  6= ;, add pro ductions A ! A o, for all p ost x, left-asso ciative

p ost

i i

N

op erators o with precedence p , and A ! A o, for all p ost x, non-asso ciative

i i1

i

op erators o with precedence p .

i

iv. Finally, let nullary op erators have the maximum of its declared precedences: If  6= ;,

n ull

then for all op erators o in the op erator table, add the pro duction A ! o where p is the

p

highest indexed slot in the op erator table containing o.

2. The nonterminals and terminals of the induced grammar are given in the standard way: a grammar

symb ol which app ears on some left hand side is a nonterminal; all other symb ols are terminals.

Additionally, the dynamic op erator grammar and the induced grammar share their start symb ols.

Figure 5: Algorithm for Induced Grammar. 10

A ! A

k

R R L L N N

k : A ! A , A ! A , A ! A , A ! A

k k 1

k k k k k k

.

.

.

R R L L N N

i : A ! A , A ! A , A ! A , A ! A

i i1

i i i i i i

.

.

.

R R L L N N

1: A ! A , A ! A , A ! A , A ! A

1 0

1 1 1 1 1 1

0: A ! ! , for all A ! ! in 

0 r em

N

Figure 6: Skeleton for induced grammar. The pro duction A ! A is included only if  6= ;.

0 r em

1

Example 4.1: Consider the Prolog term grammar and the op erator table in g. 2. Sorting the table gives

N

us three precedence levels  g. 7. The induced grammar is shown b elowterm omitted for brevity. The

i

induced grammar is deterministic as veri ed by yacc.

L

term ! '+' j '-' j term

3

3

L L L

term ! term '+' term j term '-' term j term

2 2 2

3 3 3

L

term ! '*' j '/' j term

2

2

L L L

term ! term '*' term j term '/' term j term

1 1 1

2 2 2

R

term ! '!' j term

1

1

R R R L

term ! '+' term j '-' term j term

1 1 1 1

L L

term ! term '!' j term

0

1 1

term ! atom j var j '' term ''

0 3

Although the ab ove example pro duced a deterministic grammar, unrestricted overloading can pro duce

an ambiguous grammar. However, certain reasonable restrictions on op erator declarations and overloading

guarantee determinacy with one lo ok-ahead token in the deferred decision parser:

1. All declarations for the same symbol have the same precedence.

2. If a symb ol's in x asso ciativityisnon , then if declared its pre x and p ost x asso ciativities must b e

non .

3. If a symb ol's in x asso ciativityis left , then if declared its pre x asso ciativitymust b e non and its

p ost x asso ciativitymust b e left .

4. If a symb ol's in x asso ciativityisright , then if declared its pre x asso ciativitymust b e right and

there must b e no p ost x declaration for this symb ol.

The recent ISO draft prop oses a di erent set of restrictions to avoid ambiguities [26]. Probably neither

solution is the nal word. We hop e our parser generator can b e a useful to ol to explore design alternatives.

5 Application to De nite Clause Grammars

The de nite clause grammars found in Prolog were originally designed for parsing highly ambiguous

grammars with short sentences, natural languages b eing the primary example. Since Prolog employs a

top-down backtracking execution style, the evaluation of DCGs will resemble the b ehavior of a top-down

parser with backtracking.

In compiler theory,interest is commonly fo cused on deterministic languages. The b ene t of Prolog as a

compiler to ol has b een observed by Cohen and Hickey [7]. However, there are several reasons why a top-down

backtracking parser is unsuitable for recognizing sentences such as programming language constructs. 11

Slot Prec o, Fix , Assoc 

1 300 , pre, right, +, pre, right, !, post, left

2 400 , in, left, =, in, left

3 500 , in, left, +, in, left

Figure 7: Sorted Op erator Table.

 A left-recursive pro duction, suchasE ! E T , will send the parser into an in nite lo op, even though

the grammar is not ambiguous. There are techniques for eliminating left recursion, but they enlarge

the grammar, sometimes signi cantly [3]. Also, there is no clearcut way to transform the semantic

actions correctly.

 Unless the parser is predictive [3]itmay sp end considerable time on backtracking.

 A backtracking parser requires the whole input stream of tokens to b e present during parsing.

 Backtracking may b e undesirable in a compiler where semantic actions are not idemp otent, e.g.

generation of ob ject co de.

Deterministic b ottom-up parsers, on the other hand, run in linear time, require no input bu ering

and handle left-recursive pro ductions as well as right-recursive. They do not normally supp ort ambiguous

grammars. Nilsson [22] has implemented a nondeterministic b ottom-up evaluator for DCGs by letting the

parser backtrackover con icting entries in the parse table.

Our approach is to view De nite Clause Grammars DCGs as attribute grammars, whichhave b een

studied extensively in connection with deterministic translation schemes [3]. In terms of argument modes of

DCG goals, synthesized attributes corresp ond to \output" arguments, while inherited attributes corresp ond

to \input" arguments. S-attributed de nitions, that is, syntax-directed de nitions with only synthesized

attributes, can b e evaluated by a b ottom-up parser as the input is b eing parsed. The L-attributed de nitions

allow certain inherited attributes as well, thus forming a prop er sup erset of the S-attributed de nitions.

Implementation techniques for L-attributed de nitions based on grammar mo di cations or p ost-traversals

of the parse tree are known [3]. Also, it is p ossible to asso ciate a nonterminal with a function from inherited

to synthesized attributes [20].

In the following example we demonstrate how inherited attributes in a b ottom-up parser can b e

implemented in systems with coroutining facilities.

Example 5.1: The following non-L-attributed grammar [3] recognizes C declarations of the form int a,

b[2][5].

decl ::= typeT, listT.

typeT ::= int, { T = integer }.

typeT ::= float, { T = float }.

listT ::= listT, ',', nameT.

listT ::= nameT.

nameT ::= nameAT, '[', intN, ']', { AT = arrayN,T }.

nameT ::= identN, { installN,T }.

Ignoring for the moment the problem with left-recursion in the grammar, we notice that a top-down parser

would in the rst pro duction have the typ e information T available before it tried to recognize list.Ina

b ottom-up parser, however, T will not b e available until the entire right-hand side has b een reduced. As a

result of this, the b ottom-up parser will not have the typ e available as it tries to install the identi ers in the 12

symb ol table. The grammar is non-L-attributed b ecause attribute N is propagated right-to-left in the fth

pro duction, making it dicult even for a top-down parser to compute all the attributes in one pass.

If the host language allows pro cedures goals, in Prolog to b e blo cked until certain conditions are true it

is a straightforward task to p ostp one the execution of a semantic action until its attribute values have b een

3

b ound. Using the coroutining predicate whenCondition ,Goal  in SICStus Prolog [19]we can rewrite the

ab ove sp eci cation as follows.

decl ::= typeT, listT.

typeT ::= int, { T = integer }.

typeT ::= float, { T = float }.

listT ::= listT, ',', nameT.

listT ::= nameT.

nameT ::= nameAT, '[', intN, ']', { AT = arrayN,T }.

nameT ::= identN, { whengroundN,groundT, installN,T }.

An algorithm to do such a transformation automatically would dep end on supplied mo de declarations, e.g.

:- mode install++,++. Pro cedure calls within actions must then b e blo cked until ++-arguments are

ground and +-arguments are b ound.

6 Implementation

Wehave implemented the deferred decision parser generator in a standard dialect of Prolog. However, there

is nothing ab out the metho d that prevents it from b eing incorp orated into any standard LR parser generator.

There are three issues that need to b e considered:

1. The user has to b e able to declare dynamic op erators, like tokens are declared in yacc using token.

2. Shift/reduce con icts have to b e turned into resolve actions. This can b e done as a p ost-pro cessing pass

on the parse table using the itemsets e.g. y.output for yacc. Recall that only con icts involving

dynamic op erators are candidates | all other con icts have to b e rep orted as usual.

3. The parser accesses parse table entries through the pro cedure parse action .

The source co de and manual for the parser generator is available from the authors | send mail to

[email protected] and request a copy.

7 Concluding Remarks

Wehave addressed some of the problems in parsing languages with dynamic op erators, identi ed the

shortcomings of the current parsing metho ds, and nally prop osed a new parsing technique, deferred decision

parsing, that p ostp ones resolving shift/reduce con icts involving op erators to run time.

The technique has b een built into an LR style parser-generator that pro duces deterministic, ecient, and

table-driven parsers. Prototyp e parsers for Prolog [23] and Standard ML have successfully b een generated.

Reasonably lib eral op erator overloading is supp orted.

Wehave also p ointed out some of the drawbacks of using a top-down parser for traditional parsing tasks,

such as in compiler construction, and argued that a b ottom-up parser is a much b etter replacement.

More work needs to b e done to categorize exactly what typ es of languages can b e parsed with the deferred

decision metho d. Another area that hasn't b een addressed is error handling and recovery from errors. We

are also interested in handling inherited attributes in the b ottom-up evaluation of attribute grammars.

3

The suggestion of using freeze, of which when is a variant, was p ointed out by one of the ILPS reviewers. Other versions

of Prolog have similar features. See also [10] for a similar technique based on lazy evaluation in functional programming. 13

8 Acknowledgements

This researchwas supp orted in part by NSF grants CCR-8958590, IRI-8902287, and IRI-9102513, by

equipment donations from Sun Microsystems, Inc., and software donations from Quintus Computer Systems,

Inc.

The authors are grateful to Richard O'Keefe and Roger Scowen for valuable input on the syntax of Prolog,

and Manuel Bermudez for helpful discussions on parser generation. We are also indebted to Lawrence Byrd

for suggesting that LR parsing technology b e applied to De nite Clause Grammars. Michel Mauny and

Pierre Weiss of inria supplied pro duction rules for caml light. Finally we wish to thank the reviewers on

the ILPS'93 committee for their input.

References

[1] A. V. Aho and S. C. Johnson. LR parsing. Computing Surveys, June 1974.

[2] A. V. Aho, S. C. Johnson, and J. D. Ullman. Deterministic Parsing of Ambiguous Grammars. Communications

of the ACM, 188:441{52, 1975.

[3] A. V. Aho, R. Sethi, and J. D. Ullman. , Principles, Techniques, and Tools. Addison Wesley, ISBN

0-201-10088-6, 1985.

[4] Manuel E. Bermudez and George Logothetis. Simple Computation of LALR1 Lo okahead Sets. Information

Processing Letters, 31:233{238, 1989.

[5] D. L. Bowen, L. Byrd, F. C. N. Pereira, L. M. Pereira, and D. H. D. Warren. DECsystem-10 Prolog User's

Manual, 1982.

[6] W. F. Clo cksin and C. S. Mellish. Programming in Prolog. Springer Verlag, 1981.

[7] Jacques Cohen and Timothy J. Hickey.Parsing and Compiling Using Prolog. ACM Transactions on Programming

Languages and Systems, 92, 1987.

[8] Jay Earley. An Ecient Context-Free Parsing Algorithm. Communications of the ACM, 132, 1970.

[9] Charles N. Fischer and Richard J. LeBlanc Jr. Crafting a Compiler. Benjamin/Cummings Publishing Company,

Inc, 1988.

[10] Goran Uddeb org. A Functional Parser Generator. Technical Rep ort 43, Dept. of Computer Sciences, Chalmers

UniversityofTechnology, Gothenburg, 1988.

[11] Jan Heering, Paul Klint, and Jan Rekers. Incremental Generation of Parsers. IEEE Transactions on Software

Engineering, 1612:1344{1351, Dec 1990.

[12] Glenn Holloway, Judy Townley,Jay Spitzen, and Ben Wegbreit. ECL Programmer's Manual, 1974.

[13] R. Nigel Horsp o ol. Incremental Generation of LR Parsers. Computer Languages, 154:205{223, 1990.

[14] Paul Hudak and Philip Wadler, editors. Report on the Programming Language Haskel l.Yale University, 1990.

[15] S. C. Johnsson. Yacc - Yet another compiler compiler. Technical Rep ort CSTR 32, AT&T Bell Lab oratories,

Murray Hill, NJ, 1975.

[16] Simon L. Peyton Jones. Parsing Dist x Op erators. Communications of the ACM, 292, Feb 1986.

[17] James Kerr. On LR Parsing of Languages with Dynamic Op erators. Technical Rep ort UCSC-CRL-89-13, UC

Santa Cruz, 1989.

[18] W. R. LaLonde and J. des Rivieres. Handling Op erator Precedence in Arithmetic Expressions with Tree

Transformations. ACM TOPLAS, 31, 1981.

[19] M. Carlsson and J. Wid en and J. Andersson and S. Andersson and K. Bo ortz and H. Nilsson and T. Sjoland.

SICStus Prolog User's Manual. Technical rep ort, Swedish Institute of Computer Science, Oct 1991.

[20] Brian H. Mayoh. Attribute Grammars and Mathematical Semantics. SIAM J. Comput., 103, 1981.

[21] Robin Milner, Mads Tofte, and Rob ert Harp er. The de nition of StandardML. MIT Press, 1990.

[22] Ulf Nilsson. AID: An Alternative Implementation of DCGs. New Generation Computing, 4:383{399, 1986. 14

[23] Kjell Post and Allen Van Gelder. Parsing Prolog. Technical Rep ort UCSC-CRL-93-22, University of California,

Santa Cruz, 1993.

[24] Kjell Post, Allen Van Gelder, and James Kerr. Deterministic Parsing of Languages with Dynamic Op erators.

Technical Rep ort UCSC-CRL-91-31, University of California, Santa Cruz, 1991.

[25] R. S. Scowen. Prolog { Budap est pap ers { 2 { Input/Output, Arithmetic, Mo dules, etc. Technical Rep ort

ISO/IEC JTC1 SC22 WG17 N69, International Organization for Standardization , 1990.

[26] R. S. Scowen. Draft Prolog Standard. Technical Rep ort ISO/IEC JTC1 SC22 WG17 N92, International

Organization for Standardization, 1992.

[27] Masaru Tomita. Ecient Parsing for Natural Language. Kluwer Academic Publishers, Boston, 1986.

[28] A. van Wijngaarden et al., editor. RevisedReport on the Algorithmic Language Algol 68. Springer-Verlag, 1976. 15

Step Stack Input Action

1. 0 op- varX op+ varY op* varZ op! $ shift 3

2. 0 op- 3 varX op+ varY op* varZ op! $ shift 4

3. 0 op- 3 varX 4 op+ varY op* varZ op! $ reduce term ! varX

4. 0 op- 3 term 5 op+ varY op* varZ op! $ resolve6,term ! op- term 

5. 0 term 10 op+ varY op* varZ op! $ shift 6

6. 0 term 10 op+ 6 varY op* varZ op! $ shift 4

7. 0 term 10 op+ 6 varY 4 op* varZ op! $ reduce term ! varY

8. 0 term 10 op+ 6 term 7 op* varZ op! $ resolve6,term ! term op+ term 

9. 0 term 10 op+ 6 term 7 op* 6 varZ op! $ shift 4

10. 0 term 10 op+ 6 term 7 op* 6 varZ 4 op! $ reduce term ! varZ

11. 0 term 10 op+ 6 term 7 op* 6 term 7 op! $ resolve6,term ! term op* term 

12. 0 term 10 op+ 6 term 7 op* 6 term 7 op! 6 $ reduce term ! term op!

13. 0 term 10 op+ 6 term 7 op* 6 term 7 $ reduce term ! term op* term

14. 0 term 10 op+ 6 term 7 $ reduce term ! term op+ term

15. 0 term 10 $ accept

Figure 8: Deferred Decision Parsing Example. 16