Building a Parser III CS164 3:30-5:00 TT 10 Evans

Administrativia • WA1 due on Thu • PA2 in a week Building a Parser III • Slides on the web site – I do my best to have slides ready and posted by CS164 the end of the preceding logical day 3:30-5:00 TT – yesterday, my best was not good enough 10 Evans Prof. Bodik CS 164 Lecture 6 1 2 Prof. Bodik CS 164 Lecture 6 Overview Review: grammar for arithmetic expressions • Finish recursive descent parser • Simple arithmetic expressions: – when it breaks down and how to fix it E d n | id | ( E ) | E + E | E * E – eliminating left recursion – reordering productions • Some elements of this language: • Predictive parsers (aka LL(1) parsers) –id – computing FIRST, FOLLOW –n – table-driven, stack-manipulating version of the –( n ) parser –n + id –id * ( id + id ) 3 4 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Review: derivation Recursive Descent Parsing Grammar: E d n | id | ( E ) | E + E | E * E • Consider the grammar •a derivation: E → T + E | T E rewrite E with ( E ) T → int | int * T | ( E ) ( E ) rewrite E with n • Token stream is: int * int ( n ) this is the final string of terminals 5 2 • another derivation (written more concisely): • Start with top-level non-terminal E E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id ) • this is left-most derivation (remember it) • Try the rules for E in order – always expand the left-most non-terminal – can you guess what’s right-most derivation? 5 6 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 1 Recursive-Descent Parsing When Recursive Descent Does Not Work • Parsing: given a string of tokens t1 t2 ... tn, find • Consider a production S → S a: its parse tree – In the process of parsing S we try the above rule • Recursive-descent parsing: Try all the – What goes wrong? productions exhaustively •A fix? – At a given moment the fringe of the parse tree is: – S must have a non-recursive production, say S → b t1 t2 … tk A … – expand this production before you expand S → S a – Try all the productions for A: if A d BC is a • Problems remain production, the new fringe is t t … t B C … 1 2 k – performance (steps needed to parse “baaaaa”) – Backtrack when the fringe doesn’t match the string – termination (parse the error input “c”) – Stop when there are no more non-terminals 7 8 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Solutions A Recursive Descent Parser (1) • First, restrict backtracking • Define boolean functions that check the token – backtrack just enough to produce a sufficiently string for a match of powerful r.d. parser – A given token terminal • Second, eliminate left recursion bool term(TOKEN tok) { return in[next++] == tok; } – transformation that produces a different grammar – A given production of S (the nth) – the new grammar generates same strings bool Sn() { … } – but does it give us same parse tree as old grammar? – Any production of S: bool S() { … } • Let’s see the restricted r.d. parser first • These functions advance next 9 10 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 A Recursive Descent Parser (2) A Recursive Descent Parser (3) • For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); } • For all productions of E (with backtracking) bool T() { bool E() { int save = next; int save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } 11 12 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 2 Recursive Descent Parsing. Notes. Now back to left-recursive grammars • To start the parser • Does this style of r.d. parser work for our – Initialize next to point to first token left-recursive grammar? – Invoke E() – the grammar: S → S a | b • Notice how this simulates our backtracking example from lecture – what happens when S → S a is expanded first? • Easy to implement by hand – what happens when S → b is expanded first? • Predictive parsing is more efficient 13 14 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Left-recursive grammars Elimination of Left Recursion • A left-recursive grammar has a non-terminal S • Consider the left-recursive grammar S →+ Sα for some α S → S α | β • Recursive descent does not work in such cases – It goes into an ∞ loop •Sgenerates all strings starting with a β and •Notes: followed by a number of α – α: a shorthand for any string of terminals, non- terminals • Can rewrite using right-recursion →+ –symbol is a shorthand for “can be derived in one S →βS’ or more steps”: •S →+ Sα is same as S → … → Sα S’ →αS’ | ε 15 16 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Elimination of Left-Recursion. Example More Elimination of Left-Recursion • Consider the grammar • In general S d 1 | S 0 ( β = 1 and α = 0 ) S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of can be rewritten as β1,…,βm and continue with several instances of S d 1 S’ α1,…,αn S’ d 0 S’ | ε •Rewrite as S →β1 S’ | … | βm S’ S’ →α1 S’ | … | αn S’ | ε 17 18 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 3 General Left Recursion Summary of Recursive Descent • The grammar • simple parsing strategy – left-recursion must be eliminated first S → A α | δ – … but that can be done automatically A → S β • unpopular because of backtracking is also left-recursive because – thought to be too inefficient + – in practice, backtracking is (sufficiently) eliminated by S → S βα restricting the grammar • so, it’s good enough for small languages – careful, though: order of productions important even after • This left-recursion can also be eliminated left-recursion eliminated • See [ASU], Section 4.3 for general algorithm – try to reverse the order of E → T + E | T – what goes wrong? 19 20 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Motivation • Wouldn’t it be nice if – the r.d. parser just knew which production to expand next? –Idea: replace Predictive parsers return (next = save, E1()) || (next = save, E2()); } –with switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print “syntax error”; } – what’s “something”, L1, L2? • the parser will do lookahead (look at next token) 22 Prof. Bodik CS 164 Lecture 6 Predictive Parsers LL(1) Languages • Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production –No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production that could – L means “left-to-right” scan of input lead to success – L means “leftmost derivation” • Can be specified as a 2D table – k means “predict based on k tokens of lookahead” – One dimension for current non-terminal to expand • In practice, LL(1) is used – One dimension for next token – A table entry contains one production 23 24 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 4 Predictive Parsing and Left Factoring • Recall the grammar E → T + E | T T → int | int * T | ( E ) Left factoring • Impossible to predict because – For T two productions start with int – For E it is not clear how to predict • A grammar must be left-factored before use predictive parsing 26 Prof. Bodik CS 164 Lecture 6 Left-Factoring Example • Recall the grammar E → T + E | T T → int | int * T | ( E ) LL(1) parser (details) • Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε 27 Prof. Bodik CS 164 Lecture 6 LL(1) parser LL(1) Parsing Table Example • to simplify things, instead of • Left-factored grammar switch ( something ) { E → T X X → + E | ε case L1: return E1(); → → ε case L2: return E2(); T ( E ) | int Y Y * T | otherwise: print “syntax error”; • The LL(1) parsing table: } int * + ( ) $ • we’ll use a LL(1) table and a parse stack T int Y ( E ) – the LL(1) table will replace the switch E T X T X – the parse stack will replace the call stack X + E ε ε Y * T ε ε ε 29 30 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 5 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors • Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is – Consider the [E,*] entry int, use production E → T X – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y” – We’ll see later why this is so 31 32 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6 Using Parsing Tables LL(1) Parsing Algorithm • Method similar to recursive descent, except initialize stack = <S $> and next (pointer to tokens) – For each non-terminal S repeat case stack of – We look at the next token a <X, rest> : if T[X,*next] = Y1…Yn – And choose the production shown at [S,a] then stack ← <Y1…Yn rest>; • We use a stack to keep track of pending non- else error (); terminals <t, rest> : if t == *next ++ then stack ← <rest>; • We reject when we encounter an error state else error (); • We accept when we encounter end-of-input until stack == < > 33 34 Prof.

Building a Parser III CS164 3:30-5:00 TT 10 Evans

Syntax-Directed Translation, Parse Trees, Abstract Syntax Trees

Derivatives of Parsing Expression Grammars

Adaptive LL(*) Parsing: the Power of Dynamic Analysis

Parsing 1. Grammars and Parsing 2. Top-Down and Bottom-Up Parsing 3

Abstract Syntax Trees & Top-Down Parsing

Compiler Construction

Advanced Parsing Techniques

Lecture 3: Recursive Descent Limitations, Precedence Climbing

Top-Down Parsing & Bottom-Up Parsing I

Parse Trees (=Derivation Tree) by Constructing a Derivation

Technical Correspondence Techniques for Automatic Memoization with Applications to Context-Free Parsing

Packrat Parsers Can Support Left Recursion