Plan for Today
Ambiguous Grammars
Disambiguating ambiguous grammars
Predictive parsing
FIRST and FOLLOW sets
Predictive Parsing table
CS453 Lecture Top-Down Predictive Parsers 1 Ambiguous Grammars
Ambiguous grammar: E >1 parse tree for 1 sentence E E + Expression grammar parse tree 1 E E Num *
E à E * E (42) Num Num
E à E + E (7) (6) E à E - E E à ( E ) E E à ID E E E à NUM *
E + E Num String parse tree 2 Num Num (6) 42 + 7 * 6 (42) (7) what about 42-7-6?
CS453 Lecture Top-Down Predictive Parsers 2 Goal: disambiguate the grammar
Cause: the grammar did not specify the precedence nor the associativity of the operators +,-,* Two options:
keep the ambiguous grammar, but add extra directives to the parser,
so that only one tree is formed (See PA0.cup for Simple Expression
Language)
Rewrite the grammar, making the precedence and associativity
explicit in the grammar.
CS453 Lecture Top-Down Predictive Parsers 3 Unambiguous grammar for simple expressions
Grammar
E à E + T | E-T | T parse tree E T à T * F | F E + T F à ( E ) | ID | NUM T * F T String Num F (6) F 42+7*6 Num Num (7) How is the precedence encoded? (42)
How is the associativity encoded?
CS453 Lecture Top-Down Predictive Parsers 4 Side Note: Augmenting the grammar with End of File
Grammar defines the syntactically valid strings
Parser recognizes them (same as reg.exp. and scanner)
To deal with end-of-file we augment the grammar with an
end-of-file symbol ($), and create a new start symbol:
S’ à S $
This is implicit in .cup files, but is explicit when we talk about how parsing works.
CS453 Lecture Top-Down Predictive Parsers 5 Predictive Parsing
Predictive parsing, such as recursive descent parsing, creates the parse tree TOP DOWN, starting at the start symbol.
For each non-terminal N there is a method recognizing the strings that can be produced by N, with one (case) clause for each production.
This worked great for a slightly changed version of our example from last lecture:
start -> stmts EOF stmts -> ε | stmt stmts stmt -> ifStmt | whileStmt | ID = NUM ifStmt -> IF id { stmts } whileStmt -> WHILE id { stmts } because each clause could be uniquely identified by looking ahead
one token. Let’s predictively build the parse tree for
if t { while b { x = 6 }} $
CS453 Lecture Top-Down Predictive Parsers 6 When Predictive Parsing works, when it does not
What about our expression grammar:
E à E + T | E-T | T T à T * F | F F à ( E ) | ID | NUM
The E method cannot decide looking one token ahead whether to predict E+T, E-T, or T. Same problem for T.
Predictive parsing works for grammars where the first terminal symbol of each sub expression provides enough information to decide which production to use.
CS453 Lecture Top-Down Predictive Parsers 7 First
Given a phrase γ of terminals and non-terminals (a rhs of a production), FIRST(γ) is the set of all terminals that can begin a string derived from γ.
FIRST(T*F) = ? FIRST(F)= ?
FIRST(XYZ) = FIRST(X) ?
NO! X could produce ε and then FIRST(Y) comes into play
we must keep track of which non terminals are NULLABLE
CS453 Lecture Top-Down Predictive Parsers 8 Follow
It also turns out to be useful to determine which terminals can directly follow a non terminal X (to decide parsing X is finished).
terminal t is in FOLLOW(X) if there is any derivation containing Xt.
This can occur if the derivation contains XYZt and Y and Z are nullable
CS453 Lecture Top-Down Predictive Parsers 9 FIRST and FOLLOW sets
NULLABLE – X is a nonterminal – nullable(X) is true if X can derive the empty string
FIRST – FIRST(z) = {z}, where z is a terminal
– FIRST(X) = union of all FIRST( rhsi ), where X is a nonterminal and
X -> rhsi is a production
– FIRST(rhsi) = union all of FIRST(sym) on rhs up to and including first nonnullable
FOLLOW(Y), only relevant when Y is a nonterminal – look for Y in rhs of rules (lhs -> rhs) and union all FIRST sets for symbols after Y up to and including first nonnullable – if all symbols after Y are nullable then also union in FOLLOW(lhs)
CS453 Lecture Top-Down Predictive Parsers 10 Constructive Definition of nullable, first and follow
for each terminal t FIRST(t)={t}
Another Transitive Closure algorithm:
keep doing STEP until nothing changes
STEP:
for each production X à Y1 Y2 … Yk
0: if Y1to Yk nullable (or k = 0) nullable(X) = true
for each i from 1 to k, each j from i+1 to k
1: if Y1…Yi-1 nullable (or i=1) FIRST(X) += FIRST(Yi) //+: union
2: if Yi+1…Yk nullable (or i=k) FOLLOW(Yi) += FOLLOW(X)
3: if Yi+1…Yj-1 nullable (or i+1=j) FOLLOW(Yi) += FIRST(Yj)
We can compute nullable, then FIRST, and then FOLLOW
CS453 Lecture Top-Down Predictive Parsers 11 Class Exercise
Compute nullable, FIRST and FOLLOW for
Z à d | X Y Z
X à a | Y
Y à c | ε
CS453 Lecture Top-Down Predictive Parsers 12 Constructing the Predictive Parser Table
A predictive parse table has a row for each non-terminal X, and a column
for each input token t. Entries table[X,t] contain productions: for each X -> gamma for each t in FIRST(gamma) table[X,t] = X->gamma if gamma is nullable for each t in FOLLOW(X) table[X,t] = X->gamma a c d X Xàa XàY XàY Compute the predictive XàY parse table for Y Yà ε Yà ε Yà ε Z à d | X Y Z Yàc X à a | Y Z ZàXYZ ZàXYZ ZàXYZ Y à c | ε Zàd
CS453 Lecture Top-Down Predictive Parsers 13 Multiple entries in the Predictive parse table: Ambiguity
An ambiguous grammar will lead to multiple entries in the parse table.
Our grammar IS ambiguous, e.g. Z à d but also ZàXYZàYZàd
For grammars with no multiple entries in the table, we can use the table to produce one parse tree for each valid sentence. We call these grammars LL(1): Left to right parse, Left-most derivation, 1 symbol lookahead.
A recursive descent parser examines input left to right. The order it expands non-terminals is leftmost first, and it looks ahead 1 token.
CS453 Lecture Top-Down Predictive Parsers 14 Left recursion and Predictive parsing
What happens to the recursive descent parser if we have a left
recursive production rule, e.g. E à E+T|T E calls E calls E forever To eliminate left recursion we rewrite the grammar: from: to: E à E + T | E-T | T E àT E’ T à T * F | F E’ à + T E’ | - T E’ | ε F à ( E ) | ID | NUM T à F T’ T’ à * F T’ | ε F à ( E ) | ID | NUM replacing left recursion XàXγ | α (where α does not start with X) by right recursion, as X produces α γ* that can be produced right recursively. Now we can augment the grammar (SàE$), compute nullable, FIRST and FOLLOW, and produce an LL(1) predictive parse table, see Tiger Section 3.2.
CS453 Lecture Top-Down Predictive Parsers 15 Left Factoring
Left recursion does not work for predictive parsing. Neither does a grammar that has a non-terminal with two productions that start with a common phrase, so we left factor the grammar: S S' S →αβ1 Left refactor →α
S →αβ2 S' → β1 | β2 E.g.: if statement: S à IF t THEN S ELSE S | IF t THEN S | o
becomes S à IF t THEN S X | o € Xà ELSE S | ε
When building the predictive parse table, there will be a multiple entries. WHY?
CS453 Lecture Top-Down Predictive Parsers 16 Dangling else problem: ambiguity
Given construct two parse trees for
S à IF t THEN S X | o IF t THEN IF t THEN o ELSE o
Xà ELSE S | ε
S S
IF t THEN S X IF t THEN S X ε ELSE S IF t THEN S X IF t THEN S X o o ELSE S o ε o Which is the correct parse tree? (C, Java rules)
CS453 Lecture Top-Down Predictive Parsers 17 Dangling else disambiguation
The correct parse tree is:
S
IF t THEN S X ε
IF t THEN S X
o ELSE S o
We can get this parse tree by removing the Xàε rule in the multiple entry slot in the parse tree. See written homework 2.
CS453 Lecture Top-Down Predictive Parsers 18 One more time
Balanced parentheses grammar 1:
S à ( S ) | SS | ε
1. Augment the grammar
2. Construct Nullable, First and Follow
3. Build the predictive parse table, what happens?
CS453 Lecture Top-Down Predictive Parsers 19 One more time, but this time with feeling …
Balanced parentheses grammar 1:
S à ( S )S | ε
1. Augment the grammar
2. Construct Nullable, First and Follow
3. Build the predictive parse table
4. Using the predictive parse table, construct the parse tree for
( ) ( ( ) ) $
and
( ) ( ) ( ) $
CS453 Lecture Top-Down Predictive Parsers 20