Plan for Today

Ambiguous Grammars

Disambiguating ambiguous grammars

Predictive

FIRST and FOLLOW sets

Predictive Parsing table

CS453 Lecture Top-Down Predictive Parsers 1 Ambiguous Grammars

Ambiguous grammar: E >1 for 1 sentence E E + Expression grammar parse tree 1 E E Num *

E à E * E (42) Num Num

E à E + E (7) (6) E à E - E E à ( E ) E E à ID E E E à NUM *

E + E Num String parse tree 2 Num Num (6) 42 + 7 * 6 (42) (7) what about 42-7-6?

CS453 Lecture Top-Down Predictive Parsers 2 Goal: disambiguate the grammar

Cause: the grammar did not specify the precedence nor the associativity of the operators +,-,* Two options:

keep the ambiguous grammar, but add extra directives to the parser,

so that only one tree is formed (See PA0.cup for Simple Expression

Language)

Rewrite the grammar, making the precedence and associativity

explicit in the grammar.

CS453 Lecture Top-Down Predictive Parsers 3 Unambiguous grammar for simple expressions

Grammar

E à E + T | E-T | T parse tree E T à T * F | F E + T F à ( E ) | ID | NUM T * F T String Num F (6) F 42+7*6 Num Num (7) How is the precedence encoded? (42)

How is the associativity encoded?

CS453 Lecture Top-Down Predictive Parsers 4 Side Note: Augmenting the grammar with End of File

Grammar defines the syntactically valid strings

Parser recognizes them (same as reg.exp. and scanner)

To deal with end-of-file we augment the grammar with an

end-of-file symbol ($), and create a new start symbol:

S’ à S $

This is implicit in .cup files, but is explicit when we talk about how parsing works.

CS453 Lecture Top-Down Predictive Parsers 5 Predictive Parsing

Predictive parsing, such as recursive descent parsing, creates the parse tree TOP DOWN, starting at the start symbol.

For each non-terminal N there is a method recognizing the strings that can be produced by N, with one (case) clause for each production.

This worked great for a slightly changed version of our example from last lecture:

start -> stmts EOF stmts -> ε | stmt stmts stmt -> ifStmt | whileStmt | ID = NUM ifStmt -> IF id { stmts } whileStmt -> WHILE id { stmts } because each clause could be uniquely identified by looking ahead

one token. Let’s predictively build the parse tree for

if t { while b { x = 6 }} $

CS453 Lecture Top-Down Predictive Parsers 6 When Predictive Parsing works, when it does not

What about our expression grammar:

E à E + T | E-T | T T à T * F | F F à ( E ) | ID | NUM

The E method cannot decide looking one token ahead whether to predict E+T, E-T, or T. Same problem for T.

Predictive parsing works for grammars where the first terminal symbol of each sub expression provides enough information to decide which production to use.

CS453 Lecture Top-Down Predictive Parsers 7 First

Given a phrase γ of terminals and non-terminals (a rhs of a production), FIRST(γ) is the set of all terminals that can begin a string derived from γ.

FIRST(T*F) = ? FIRST(F)= ?

FIRST(XYZ) = FIRST(X) ?

NO! X could produce ε and then FIRST(Y) comes into play

we must keep track of which non terminals are NULLABLE

CS453 Lecture Top-Down Predictive Parsers 8 Follow

It also turns out to be useful to determine which terminals can directly follow a non terminal X (to decide parsing X is finished).

terminal t is in FOLLOW(X) if there is any derivation containing Xt.

This can occur if the derivation contains XYZt and Y and Z are nullable

CS453 Lecture Top-Down Predictive Parsers 9 FIRST and FOLLOW sets

NULLABLE – X is a nonterminal – nullable(X) is true if X can derive the empty string

FIRST – FIRST(z) = {z}, where z is a terminal

– FIRST(X) = union of all FIRST( rhsi ), where X is a nonterminal and

X -> rhsi is a production

– FIRST(rhsi) = union all of FIRST(sym) on rhs up to and including first nonnullable

FOLLOW(Y), only relevant when Y is a nonterminal – look for Y in rhs of rules (lhs -> rhs) and union all FIRST sets for symbols after Y up to and including first nonnullable – if all symbols after Y are nullable then also union in FOLLOW(lhs)

CS453 Lecture Top-Down Predictive Parsers 10 Constructive Definition of nullable, first and follow

for each terminal t FIRST(t)={t}

Another Transitive Closure algorithm:

keep doing STEP until nothing changes

STEP:

for each production X à Y1 Y2 … Yk

0: if Y1to Yk nullable (or k = 0) nullable(X) = true

for each i from 1 to k, each j from i+1 to k

1: if Y1…Yi-1 nullable (or i=1) FIRST(X) += FIRST(Yi) //+: union

2: if Yi+1…Yk nullable (or i=k) FOLLOW(Yi) += FOLLOW(X)

3: if Yi+1…Yj-1 nullable (or i+1=j) FOLLOW(Yi) += FIRST(Yj)

We can compute nullable, then FIRST, and then FOLLOW

CS453 Lecture Top-Down Predictive Parsers 11 Class Exercise

Compute nullable, FIRST and FOLLOW for

Z à d | X Y Z

X à a | Y

Y à c | ε

CS453 Lecture Top-Down Predictive Parsers 12 Constructing the Predictive Parser Table

A predictive parse table has a row for each non-terminal X, and a column

for each input token t. Entries table[X,t] contain productions: for each X -> gamma for each t in FIRST(gamma) table[X,t] = X->gamma if gamma is nullable for each t in FOLLOW(X) table[X,t] = X->gamma a c d X Xàa XàY XàY Compute the predictive XàY parse table for Y Yà ε Yà ε Yà ε Z à d | X Y Z Yàc X à a | Y Z ZàXYZ ZàXYZ ZàXYZ Y à c | ε Zàd

CS453 Lecture Top-Down Predictive Parsers 13 Multiple entries in the Predictive parse table: Ambiguity

An ambiguous grammar will lead to multiple entries in the parse table.

Our grammar IS ambiguous, e.g. Z à d but also ZàXYZàYZàd

For grammars with no multiple entries in the table, we can use the table to produce one parse tree for each valid sentence. We call these grammars LL(1): Left to right parse, Left-most derivation, 1 symbol lookahead.

A examines input left to right. The order it expands non-terminals is leftmost first, and it looks ahead 1 token.

CS453 Lecture Top-Down Predictive Parsers 14 Left recursion and Predictive parsing

What happens to the recursive descent parser if we have a left

recursive production rule, e.g. E à E+T|T E calls E calls E forever To eliminate left recursion we rewrite the grammar: from: to: E à E + T | E-T | T E àT E’ T à T * F | F E’ à + T E’ | - T E’ | ε F à ( E ) | ID | NUM T à F T’ T’ à * F T’ | ε F à ( E ) | ID | NUM replacing left recursion XàXγ | α (where α does not start with X) by right recursion, as X produces α γ* that can be produced right recursively. Now we can augment the grammar (SàE$), compute nullable, FIRST and FOLLOW, and produce an LL(1) predictive parse table, see Tiger Section 3.2.

CS453 Lecture Top-Down Predictive Parsers 15 Left Factoring

Left recursion does not work for predictive parsing. Neither does a grammar that has a non-terminal with two productions that start with a common phrase, so we left factor the grammar: S S' S →αβ1 Left refactor →α

S →αβ2 S' → β1 | β2 E.g.: if statement: S à IF t THEN S ELSE S | IF t THEN S | o

becomes S à IF t THEN S X | o € Xà ELSE S | ε

When building the predictive parse table, there will be a multiple entries. WHY?

CS453 Lecture Top-Down Predictive Parsers 16 Dangling else problem: ambiguity

Given construct two parse trees for

S à IF t THEN S X | o IF t THEN IF t THEN o ELSE o

Xà ELSE S | ε

S S

IF t THEN S X IF t THEN S X ε ELSE S IF t THEN S X IF t THEN S X o o ELSE S o ε o Which is the correct parse tree? (C, Java rules)

CS453 Lecture Top-Down Predictive Parsers 17 Dangling else disambiguation

The correct parse tree is:

S

IF t THEN S X ε

IF t THEN S X

o ELSE S o

We can get this parse tree by removing the Xàε rule in the multiple entry slot in the parse tree. See written homework 2.

CS453 Lecture Top-Down Predictive Parsers 18 One more time

Balanced parentheses grammar 1:

S à ( S ) | SS | ε

1. Augment the grammar

2. Construct Nullable, First and Follow

3. Build the predictive parse table, what happens?

CS453 Lecture Top-Down Predictive Parsers 19 One more time, but this time with feeling …

Balanced parentheses grammar 1:

S à ( S )S | ε

1. Augment the grammar

2. Construct Nullable, First and Follow

3. Build the predictive parse table

4. Using the predictive parse table, construct the parse tree for

( ) ( ( ) ) $

and

( ) ( ) ( ) $

CS453 Lecture Top-Down Predictive Parsers 20