Plan for Today

Plan for Today Ambiguous Grammars Disambiguating ambiguous grammars Predictive parsing FIRST and FOLLOW sets Predictive Parsing table CS453 Lecture Top-Down Predictive Parsers 1 Ambiguous Grammars Ambiguous grammar: E >1 parse tree for 1 sentence E E + Expression grammar parse tree 1 E E Num * E à E * E (42) Num Num E à E + E (7) (6) E à E - E E à ( E ) E E à ID E E E à NUM * E + E Num String parse tree 2 Num Num (6) 42 + 7 * 6 (42) (7) what about 42-7-6? CS453 Lecture Top-Down Predictive Parsers 2 Goal: disambiguate the grammar Cause: the grammar did not specify the precedence nor the associativity of the operators +,-,* Two options: keep the ambiguous grammar, but add extra directives to the parser, so that only one tree is formed (See PA0.cup for Simple Expression Language) Rewrite the grammar, making the precedence and associativity explicit in the grammar. CS453 Lecture Top-Down Predictive Parsers 3 Unambiguous grammar for simple expressions Grammar E à E + T | E-T | T parse tree E T à T * F | F E + T F à ( E ) | ID | NUM T * F T String Num F (6) F 42+7*6 Num Num (7) How is the precedence encoded? (42) How is the associativity encoded? CS453 Lecture Top-Down Predictive Parsers 4 Side Note: Augmenting the grammar with End of File Grammar defines the syntactically valid strings Parser recognizes them (same as reg.exp. and scanner) To deal with end-of-file we augment the grammar with an end-of-file symbol ($), and create a new start symbol: S’ à S $ This is implicit in .cup files, but is explicit when we talk about how parsing works. CS453 Lecture Top-Down Predictive Parsers 5 Predictive Parsing Predictive parsing, such as recursive descent parsing, creates the parse tree TOP DOWN, starting at the start symbol. For each non-terminal N there is a method recognizing the strings that can be produced by N, with one (case) clause for each production. This worked great for a slightly changed version of our example from last lecture: start -> stmts EOF stmts -> ε | stmt stmts stmt -> ifStmt | whileStmt | ID = NUM ifStmt -> IF id { stmts } whileStmt -> WHILE id { stmts } because each clause could be uniquely identified by looking ahead one token. Let’s predictively build the parse tree for if t { while b { x = 6 }} $ CS453 Lecture Top-Down Predictive Parsers 6 When Predictive Parsing works, when it does not What about our expression grammar: E à E + T | E-T | T T à T * F | F F à ( E ) | ID | NUM The E method cannot decide looking one token ahead whether to predict E+T, E-T, or T. Same problem for T. Predictive parsing works for grammars where the first terminal symbol of each sub expression provides enough information to decide which production to use. CS453 Lecture Top-Down Predictive Parsers 7 First Given a phrase γ of terminals and non-terminals (a rhs of a production), FIRST(γ) is the set of all terminals that can begin a string derived from γ. FIRST(T*F) = ? FIRST(F)= ? FIRST(XYZ) = FIRST(X) ? NO! X could produce ε and then FIRST(Y) comes into play we must keep track of which non terminals are NULLABLE CS453 Lecture Top-Down Predictive Parsers 8 Follow It also turns out to be useful to determine which terminals can directly follow a non terminal X (to decide parsing X is finished). terminal t is in FOLLOW(X) if there is any derivation containing Xt. This can occur if the derivation contains XYZt and Y and Z are nullable CS453 Lecture Top-Down Predictive Parsers 9 FIRST and FOLLOW sets NULLABLE – X is a nonterminal – nullable(X) is true if X can derive the empty string FIRST – FIRST(z) = {z}, where z is a terminal – FIRST(X) = union of all FIRST( rhsi ), where X is a nonterminal and X -> rhsi is a production – FIRST(rhsi) = union all of FIRST(sym) on rhs up to and including ﬁrst nonnullable FOLLOW(Y), only relevant when Y is a nonterminal – look for Y in rhs of rules (lhs -> rhs) and union all FIRST sets for symbols after Y up to and including ﬁrst nonnullable – if all symbols after Y are nullable then also union in FOLLOW(lhs) CS453 Lecture Top-Down Predictive Parsers 10 Constructive Definition of nullable, first and follow for each terminal t FIRST(t)={t} Another Transitive Closure algorithm: keep doing STEP until nothing changes STEP: for each production X à Y1 Y2 … Yk 0: if Y1to Yk nullable (or k = 0) nullable(X) = true for each i from 1 to k, each j from i+1 to k 1: if Y1…Yi-1 nullable (or i=1) FIRST(X) += FIRST(Yi) //+: union 2: if Yi+1…Yk nullable (or i=k) FOLLOW(Yi) += FOLLOW(X) 3: if Yi+1…Yj-1 nullable (or i+1=j) FOLLOW(Yi) += FIRST(Yj) We can compute nullable, then FIRST, and then FOLLOW CS453 Lecture Top-Down Predictive Parsers 11 Class Exercise Compute nullable, FIRST and FOLLOW for Z à d | X Y Z X à a | Y Y à c | ε CS453 Lecture Top-Down Predictive Parsers 12 Constructing the Predictive Parser Table A predictive parse table has a row for each non-terminal X, and a column for each input token t. Entries table[X,t] contain productions: for each X -> gamma for each t in FIRST(gamma) table[X,t] = X->gamma if gamma is nullable for each t in FOLLOW(X) table[X,t] = X->gamma a c d X Xàa XàY XàY Compute the predictive XàY parse table for Y Yà ε Yà ε Yà ε Z à d | X Y Z Yàc X à a | Y Z ZàXYZ ZàXYZ ZàXYZ Y à c | ε Zàd CS453 Lecture Top-Down Predictive Parsers 13 Multiple entries in the Predictive parse table: Ambiguity An ambiguous grammar will lead to multiple entries in the parse table. Our grammar IS ambiguous, e.g. Z à d but also ZàXYZàYZàd For grammars with no multiple entries in the table, we can use the table to produce one parse tree for each valid sentence. We call these grammars LL(1): Left to right parse, Left-most derivation, 1 symbol lookahead. A recursive descent parser examines input left to right. The order it expands non-terminals is leftmost first, and it looks ahead 1 token. CS453 Lecture Top-Down Predictive Parsers 14 Left recursion and Predictive parsing What happens to the recursive descent parser if we have a left recursive production rule, e.g. E à E+T|T E calls E calls E forever To eliminate left recursion we rewrite the grammar: from: to: E à E + T | E-T | T E àT E’ T à T * F | F E’ à + T E’ | - T E’ | ε F à ( E ) | ID | NUM T à F T’ T’ à * F T’ | ε F à ( E ) | ID | NUM replacing left recursion XàXγ | α (where α does not start with X) by right recursion, as X produces α γ* that can be produced right recursively. Now we can augment the grammar (SàE$), compute nullable, FIRST and FOLLOW, and produce an LL(1) predictive parse table, see Tiger Section 3.2. CS453 Lecture Top-Down Predictive Parsers 15 Left Factoring Left recursion does not work for predictive parsing. Neither does a grammar that has a non-terminal with two productions that start with a common phrase, so we left factor the grammar: S S' S →αβ1 Left refactor →α S →αβ2 S' → β1 | β2 E.g.: if statement: S à IF t THEN S ELSE S | IF t THEN S | o becomes S à IF t THEN S X | o € Xà ELSE S | ε When building the predictive parse table, there will be a multiple entries. WHY? CS453 Lecture Top-Down Predictive Parsers 16 Dangling else problem: ambiguity Given construct two parse trees for S à IF t THEN S X | o IF t THEN IF t THEN o ELSE o Xà ELSE S | ε S S IF t THEN S X IF t THEN S X ε ELSE S IF t THEN S X IF t THEN S X o o ELSE S o ε o Which is the correct parse tree? (C, Java rules) CS453 Lecture Top-Down Predictive Parsers 17 Dangling else disambiguation The correct parse tree is: S IF t THEN S X ε IF t THEN S X o ELSE S o We can get this parse tree by removing the Xàε rule in the multiple entry slot in the parse tree. See written homework 2. CS453 Lecture Top-Down Predictive Parsers 18 One more time Balanced parentheses grammar 1: S à ( S ) | SS | ε 1. Augment the grammar 2. Construct Nullable, First and Follow 3. Build the predictive parse table, what happens? CS453 Lecture Top-Down Predictive Parsers 19 One more time, but this time with feeling … Balanced parentheses grammar 1: S à ( S )S | ε 1. Augment the grammar 2. Construct Nullable, First and Follow 3. Build the predictive parse table 4. Using the predictive parse table, construct the parse tree for ( ) ( ( ) ) $ and ( ) ( ) ( ) $ CS453 Lecture Top-Down Predictive Parsers 20 .

Load more