Building a Parser II CS164 3:30-5:00 TT 10 Evans Administrativia • PA2

Building a Parser II CS164 3:30-5:00 TT 10 Evans Administrativia • PA2

Administrativia • PA2 assigned today –due in 12 days Building a Parser II • WA1 assigned today –due in a week – it’s a practice for the exam CS164 3:30-5:00 TT •First midterm 10 Evans –Oct 5 – will contain some project-inspired questions Prof. Bodik CS 164 Lecture 6 1 2 Prof. Bodik CS 164 Lecture 6 Overview Grammars • Grammars • Programming language constructs have recursive structure. • derivations – which is why our hand-written parser had this structure, too • Recursive descent parser • Eliminating left recursion •An expression is either: •number, or •variable, or • expression + expression, or • expression - expression, or • ( expression ), or •… 3 4 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Context-free grammars (CFG) Example: arithmetic expressions • a natural notation for this recursive structure • Simple arithmetic expressions: E d n | id | ( E ) | E + E | E * E • grammar for our balanced parens expressions: BalancedExpression d a | ( BalancedExpression ) • Some elements of this language: • describes (generates) strings of symbols: –id – a, (a), ((a)), (((a))), … –n • like regular expressions but can refer to –( n ) – other expressions (here, BalancedExpression) –n + id – and do this recursively (giving is “non-finite state”) –id * ( id + id ) 5 6 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 1 Symbols: Terminals and Nonterminals Notational Conventions • grammars use two kinds of symbols • In these lecture notes, let’s adopt a notation: •terminals: – no rules for replacing them – Non-terminals are written upper-case – once generated, terminals are permanent – these are tokens of our language – Terminals are written lower-case •nonterminals: or as symbols, e.g., token LPAR is written as ( – to be replaced (expanded) – in regular expression lingo, these serve as names of – The start symbol is the left-hand side of the first expressions production – start non-terminal: the first symbol to be expanded 7 8 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Derivations Example: derivation • This is how a grammar generates strings: Grammar: E d n | id | ( E ) | E + E | E * E – think of grammar rules (called productions) as rewrite rules •a derivation: • Derivation: the process of generating a string E rewrite E with ( E ) 1. begin with the start non-terminal ( E ) rewrite E with n ( n ) this is the final string of terminals 2. rewrite the non-terminal with some of its productions • another derivation (written more concisely): 3. select a non-terminal in your current string E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) i. if no non-terminal left, done. d ( n + id * id ) ii. otherwise go to step 2. 9 10 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 So how do derivations help us in parsing? Decaf Example • A program (a string of tokens) has no syntax A fragment of Decaf: error if it can be derived from the grammar. – but so far you only know how to derive some (any) STMT d while ( EXPR ) STMT string, not how to check if a given string is | id ( EXPR ) ; derivable EXPR d EXPR + EXPR • So how to do parsing? | EXPR – EXPR – a naïve solution: derive all possible strings and | EXPR < EXPR check if your program is among them | ( EXPR ) – not as bad as it sounds: there are parsers that do | id this, kind of. Coming soon. 11 12 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 2 Decaf Example (Cont.) CFGs (definition) Some elements of the (fragment of) language: •A CFG consists of – A set of terminal symbols T id ( id ) ; id ( ( ( ( id ) ) ) ) ; – A set of non-terminal symbols N while ( id < id ) id ( id ) ; – A start symbol S (a non-terminal) while ( while ( id ) ) id ( id ) ; – A set of productions: while ( id ) while ( id ) while ( id ) id ( id ) ; produtions are of two forms (X N) Question: One of the STMT d while ( EXPR ) STMT X d ε , or strings is not from | id ( EXPR ) ; the language. EXPR d EXPR + EXPR | EXPR – EXPR X d Y1 Y2 ... Yn where Yi N 4 T Which one? | EXPR < EXPR | ( EXPR ) | id 13 14 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 context-free grammars Now let’s parse a string • what is “context-free”? • recursive descent parser derives all strings – means the grammar is not context-sensitive – until it matches derived string with the input string • context-sensitive gramars – or until it is sure there is a syntax error – can describe more languages than CFGs – because their productions restrict when a non- terminal can be rewritten. An example production: d N d d A B c – meaning: N can be rewritten into ABc only when preceded by d – can be used to encode semantic checks, but parsing is hard 15 16 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Recursive Descent Parsing Recursive Descent Parsing. Example (Cont.) • Consider the grammar •Try E0 → T1 + E2 E → T + E | T • Then try a rule for T1 → ( E3 ) T → int | int * T | ( E ) – But ( does not match input token int5 • Token stream is: int * int 5 2 •TryT1 → int . Token matches. • Start with top-level non-terminal E – But + after T1 does not match input token * •Try T1 → int * T2 • Try the rules for E in order – This will match but + after T1 will be unmatched • Have exhausted the choices for T1 – Backtrack to choice for E0 17 18 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 3 Recursive Descent Parsing. Example (Cont.) A Recursive Descent Parser (2) •Try E0 → T1 • Define boolean functions that check the token string for a match of • Follow same steps as before for T1 – A given token terminal – And succeed with T1 → int * T2 and T2 → int – With the following parse tree bool term(TOKEN tok) { return in[next++] == tok; } th E0 – A given production of S (the n ) bool Sn() { … } T 1 – Any production of S: bool S() { … } int5 * T2 int2 • These functions advance next 19 20 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 A Recursive Descent Parser (3) A Recursive Descent Parser (4) • For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); } • For all productions of E (with backtracking) bool T() { bool E() { int save = next; int save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } 21 22 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Recursive Descent Parsing. Notes. Recursive Descent Parsing. Notes. • To start the parser • Easy to implement by hand – Initialize next to point to first token – An example implementation is provided as a – Invoke E() supplement “Recursive Descent Parsing” • Notice how this simulates our backtracking example from lecture • Easy to implement by hand • But does not always work … • Predictive parsing is more efficient 23 24 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 4 Recursive-Descent Parsing When Recursive Descent Does Not Work • Parsing: given a string of tokens t1 t2 ... tn, find • Consider a production S → S a: its parse tree – In the process of parsing S we try the above rule • Recursive-descent parsing: Try all the – What goes wrong? productions exhaustively – At a given moment the fringe of the parse tree is: • A left-recursive grammar has a non-terminal S t t … t A … 1 2 k S →+ Sα for some α – Try all the productions for A: if A d BC is a production, the new fringe is t1 t2 … tk B C … – Backtrack when the fringe doesn’t match the string • Recursive descent does not work in such cases – Stop when there are no more non-terminals – It goes into an ∞ loop 25 26 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Elimination of Left Recursion Elimination of Left-Recursion. Example • Consider the left-recursive grammar • Consider the grammar S → S α | β S d 1 | S 0 ( β = 1 and α = 0 ) •Sgenerates all strings starting with a β and can be rewritten as followed by a number of α S d 1 S’ S’ d 0 S’ | ε • Can rewrite using right-recursion S →βS’ S’ →αS’ | ε 27 28 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 More Elimination of Left-Recursion General Left Recursion • In general • The grammar S → A α | δ S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of A → S β is also left-recursive because β1,…,βm and continue with several instances of →+ βα α1,…,αn S S •Rewrite as S →β1 S’ | … | βm S’ • This left-recursion can also be eliminated S’ →α1 S’ | … | αn S’ | ε • See [ASU], Section 4.3 for general algorithm 29 30 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 5 Summary of Recursive Descent • Simple and general parsing strategy – Left-recursion must be eliminated first – … but that can be done automatically • Unpopular because of backtracking – Thought to be too inefficient • In practice, backtracking is eliminated by restricting the grammar 31 Prof. Bodik CS 164 Lecture 6 6.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us