Building a Parser III CS164 3:30-5:00 TT 10 Evans

Building a Parser III CS164 3:30-5:00 TT 10 Evans

Administrativia • WA1 due on Thu • PA2 in a week Building a Parser III • Slides on the web site – I do my best to have slides ready and posted by CS164 the end of the preceding logical day 3:30-5:00 TT – yesterday, my best was not good enough 10 Evans Prof. Bodik CS 164 Lecture 6 1 2 Prof. Bodik CS 164 Lecture 6 Overview Review: grammar for arithmetic expressions • Finish recursive descent parser • Simple arithmetic expressions: – when it breaks down and how to fix it E d n | id | ( E ) | E + E | E * E – eliminating left recursion – reordering productions • Some elements of this language: • Predictive parsers (aka LL(1) parsers) –id – computing FIRST, FOLLOW –n – table-driven, stack-manipulating version of the –( n ) parser –n + id –id * ( id + id ) 3 4 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Review: derivation Recursive Descent Parsing Grammar: E d n | id | ( E ) | E + E | E * E • Consider the grammar •a derivation: E → T + E | T E rewrite E with ( E ) T → int | int * T | ( E ) ( E ) rewrite E with n • Token stream is: int * int ( n ) this is the final string of terminals 5 2 • another derivation (written more concisely): • Start with top-level non-terminal E E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id ) • this is left-most derivation (remember it) • Try the rules for E in order – always expand the left-most non-terminal – can you guess what’s right-most derivation? 5 6 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 1 Recursive-Descent Parsing When Recursive Descent Does Not Work • Parsing: given a string of tokens t1 t2 ... tn, find • Consider a production S → S a: its parse tree – In the process of parsing S we try the above rule • Recursive-descent parsing: Try all the – What goes wrong? productions exhaustively •A fix? – At a given moment the fringe of the parse tree is: – S must have a non-recursive production, say S → b t1 t2 … tk A … – expand this production before you expand S → S a – Try all the productions for A: if A d BC is a • Problems remain production, the new fringe is t t … t B C … 1 2 k – performance (steps needed to parse “baaaaa”) – Backtrack when the fringe doesn’t match the string – termination (parse the error input “c”) – Stop when there are no more non-terminals 7 8 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Solutions A Recursive Descent Parser (1) • First, restrict backtracking • Define boolean functions that check the token – backtrack just enough to produce a sufficiently string for a match of powerful r.d. parser – A given token terminal • Second, eliminate left recursion bool term(TOKEN tok) { return in[next++] == tok; } – transformation that produces a different grammar – A given production of S (the nth) – the new grammar generates same strings bool Sn() { … } – but does it give us same parse tree as old grammar? – Any production of S: bool S() { … } • Let’s see the restricted r.d. parser first • These functions advance next 9 10 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 A Recursive Descent Parser (2) A Recursive Descent Parser (3) • For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); } • For all productions of E (with backtracking) bool T() { bool E() { int save = next; int save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } 11 12 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 2 Recursive Descent Parsing. Notes. Now back to left-recursive grammars • To start the parser • Does this style of r.d. parser work for our – Initialize next to point to first token left-recursive grammar? – Invoke E() – the grammar: S → S a | b • Notice how this simulates our backtracking example from lecture – what happens when S → S a is expanded first? • Easy to implement by hand – what happens when S → b is expanded first? • Predictive parsing is more efficient 13 14 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Left-recursive grammars Elimination of Left Recursion • A left-recursive grammar has a non-terminal S • Consider the left-recursive grammar S →+ Sα for some α S → S α | β • Recursive descent does not work in such cases – It goes into an ∞ loop •Sgenerates all strings starting with a β and •Notes: followed by a number of α – α: a shorthand for any string of terminals, non- terminals • Can rewrite using right-recursion →+ –symbol is a shorthand for “can be derived in one S →βS’ or more steps”: •S →+ Sα is same as S → … → Sα S’ →αS’ | ε 15 16 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Elimination of Left-Recursion. Example More Elimination of Left-Recursion • Consider the grammar • In general S d 1 | S 0 ( β = 1 and α = 0 ) S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of can be rewritten as β1,…,βm and continue with several instances of S d 1 S’ α1,…,αn S’ d 0 S’ | ε •Rewrite as S →β1 S’ | … | βm S’ S’ →α1 S’ | … | αn S’ | ε 17 18 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 3 General Left Recursion Summary of Recursive Descent • The grammar • simple parsing strategy – left-recursion must be eliminated first S → A α | δ – … but that can be done automatically A → S β • unpopular because of backtracking is also left-recursive because – thought to be too inefficient + – in practice, backtracking is (sufficiently) eliminated by S → S βα restricting the grammar • so, it’s good enough for small languages – careful, though: order of productions important even after • This left-recursion can also be eliminated left-recursion eliminated • See [ASU], Section 4.3 for general algorithm – try to reverse the order of E → T + E | T – what goes wrong? 19 20 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Motivation • Wouldn’t it be nice if – the r.d. parser just knew which production to expand next? –Idea: replace Predictive parsers return (next = save, E1()) || (next = save, E2()); } –with switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print “syntax error”; } – what’s “something”, L1, L2? • the parser will do lookahead (look at next token) 22 Prof. Bodik CS 164 Lecture 6 Predictive Parsers LL(1) Languages • Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production –No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production that could – L means “left-to-right” scan of input lead to success – L means “leftmost derivation” • Can be specified as a 2D table – k means “predict based on k tokens of lookahead” – One dimension for current non-terminal to expand • In practice, LL(1) is used – One dimension for next token – A table entry contains one production 23 24 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 4 Predictive Parsing and Left Factoring • Recall the grammar E → T + E | T T → int | int * T | ( E ) Left factoring • Impossible to predict because – For T two productions start with int – For E it is not clear how to predict • A grammar must be left-factored before use predictive parsing 26 Prof. Bodik CS 164 Lecture 6 Left-Factoring Example • Recall the grammar E → T + E | T T → int | int * T | ( E ) LL(1) parser (details) • Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 27 Prof. Bodik CS 164 Lecture 6 LL(1) parser LL(1) Parsing Table Example • to simplify things, instead of • Left-factored grammar switch ( something ) { E → T X X → + E | ε case L1: return E1(); → → ε case L2: return E2(); T ( E ) | int Y Y * T | otherwise: print “syntax error”; • The LL(1) parsing table: } int * + ( ) $ • we’ll use a LL(1) table and a parse stack T int Y ( E ) – the LL(1) table will replace the switch E T X T X – the parse stack will replace the call stack X + E ε ε Y * T ε ε ε 29 30 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 5 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors • Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is – Consider the [E,*] entry int, use production E → T X – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y” – We’ll see later why this is so 31 32 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Using Parsing Tables LL(1) Parsing Algorithm • Method similar to recursive descent, except initialize stack = <S $> and next (pointer to tokens) – For each non-terminal S repeat case stack of – We look at the next token a <X, rest> : if T[X,*next] = Y1…Yn – And choose the production shown at [S,a] then stack ← <Y1…Yn rest>; • We use a stack to keep track of pending non- else error (); terminals <t, rest> : if t == *next ++ then stack ← <rest>; • We reject when we encounter an error state else error (); • We accept when we encounter end-of-input until stack == < > 33 34 Prof.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us