Administrativia
• WA1 due on Thu • PA2 in a week Building a Parser III • Slides on the web site – I do my best to have slides ready and posted by CS164 the end of the preceding logical day 3:30-5:00 TT – yesterday, my best was not good enough 10 Evans
Prof. Bodik CS 164 Lecture 6 1 2 Prof. Bodik CS 164 Lecture 6
Overview Review: grammar for arithmetic expressions
• Finish recursive descent parser • Simple arithmetic expressions: – when it breaks down and how to fix it E d n | id | ( E ) | E + E | E * E – eliminating left recursion – reordering productions • Some elements of this language: • Predictive parsers (aka LL(1) parsers) –id – computing FIRST, FOLLOW –n – table-driven, stack-manipulating version of the –( n ) parser –n + id –id * ( id + id )
3 4 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Review: derivation Recursive Descent Parsing
Grammar: E d n | id | ( E ) | E + E | E * E • Consider the grammar •a derivation: E → T + E | T E rewrite E with ( E ) T → int | int * T | ( E ) ( E ) rewrite E with n • Token stream is: int * int ( n ) this is the final string of terminals 5 2 • another derivation (written more concisely): • Start with top-level non-terminal E E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id ) • this is left-most derivation (remember it) • Try the rules for E in order – always expand the left-most non-terminal – can you guess what’s right-most derivation?
5 6 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
1 Recursive-Descent Parsing When Recursive Descent Does Not Work
• Parsing: given a string of tokens t1 t2 ... tn, find • Consider a production S → S a: its parse tree – In the process of parsing S we try the above rule • Recursive-descent parsing: Try all the – What goes wrong? productions exhaustively •A fix? – At a given moment the fringe of the parse tree is: – S must have a non-recursive production, say S → b t1 t2 … tk A … – expand this production before you expand S → S a – Try all the productions for A: if A d BC is a • Problems remain production, the new fringe is t t … t B C … 1 2 k – performance (steps needed to parse “baaaaa”) – Backtrack when the fringe doesn’t match the string – termination (parse the error input “c”) – Stop when there are no more non-terminals
7 8 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Solutions A Recursive Descent Parser (1)
• First, restrict backtracking • Define boolean functions that check the token – backtrack just enough to produce a sufficiently string for a match of powerful r.d. parser – A given token terminal • Second, eliminate left recursion bool term(TOKEN tok) { return in[next++] == tok; } – transformation that produces a different grammar – A given production of S (the nth)
– the new grammar generates same strings bool Sn() { … } – but does it give us same parse tree as old grammar? – Any production of S: bool S() { … } • Let’s see the restricted r.d. parser first • These functions advance next 9 10 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
A Recursive Descent Parser (2) A Recursive Descent Parser (3)
• For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); }
• For all productions of E (with backtracking) bool T() { bool E() { int save = next; int save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } 11 12 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
2 Recursive Descent Parsing. Notes. Now back to left-recursive grammars
• To start the parser • Does this style of r.d. parser work for our – Initialize next to point to first token left-recursive grammar? – Invoke E() – the grammar: S → S a | b • Notice how this simulates our backtracking example from lecture – what happens when S → S a is expanded first? • Easy to implement by hand – what happens when S → b is expanded first? • Predictive parsing is more efficient
13 14 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Left-recursive grammars Elimination of Left Recursion
• A left-recursive grammar has a non-terminal S • Consider the left-recursive grammar S →+ Sα for some α S → S α | β • Recursive descent does not work in such cases – It goes into an ∞ loop •Sgenerates all strings starting with a β and •Notes: followed by a number of α – α: a shorthand for any string of terminals, non- terminals • Can rewrite using right-recursion →+ –symbol is a shorthand for “can be derived in one S →βS’ or more steps”: •S →+ Sα is same as S → … → Sα S’ →αS’ | ε
15 16 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Elimination of Left-Recursion. Example More Elimination of Left-Recursion
• Consider the grammar • In general S d 1 | S 0 ( β = 1 and α = 0 ) S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of can be rewritten as β1,…,βm and continue with several instances of S d 1 S’ α1,…,αn S’ d 0 S’ | ε •Rewrite as
S →β1 S’ | … | βm S’
S’ →α1 S’ | … | αn S’ | ε
17 18 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
3 General Left Recursion Summary of Recursive Descent
• The grammar • simple parsing strategy – left-recursion must be eliminated first S → A α | δ – … but that can be done automatically A → S β • unpopular because of backtracking is also left-recursive because – thought to be too inefficient + – in practice, backtracking is (sufficiently) eliminated by S → S βα restricting the grammar • so, it’s good enough for small languages – careful, though: order of productions important even after • This left-recursion can also be eliminated left-recursion eliminated • See [ASU], Section 4.3 for general algorithm – try to reverse the order of E → T + E | T – what goes wrong?
19 20 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Motivation
• Wouldn’t it be nice if – the r.d. parser just knew which production to expand next? –Idea: replace Predictive parsers return (next = save, E1()) || (next = save, E2()); } –with switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print “syntax error”; } – what’s “something”, L1, L2? • the parser will do lookahead (look at next token)
22 Prof. Bodik CS 164 Lecture 6
Predictive Parsers LL(1) Languages
• Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production –No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production that could – L means “left-to-right” scan of input lead to success – L means “leftmost derivation” • Can be specified as a 2D table – k means “predict based on k tokens of lookahead” – One dimension for current non-terminal to expand • In practice, LL(1) is used – One dimension for next token – A table entry contains one production 23 24 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
4 Predictive Parsing and Left Factoring
• Recall the grammar E → T + E | T T → int | int * T | ( E ) Left factoring • Impossible to predict because – For T two productions start with int – For E it is not clear how to predict
• A grammar must be left-factored before use predictive parsing 26 Prof. Bodik CS 164 Lecture 6
Left-Factoring Example
• Recall the grammar E → T + E | T T → int | int * T | ( E ) LL(1) parser (details) • Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε
27 Prof. Bodik CS 164 Lecture 6
LL(1) parser LL(1) Parsing Table Example
• to simplify things, instead of • Left-factored grammar switch ( something ) { E → T X X → + E | ε case L1: return E1(); → → ε case L2: return E2(); T ( E ) | int Y Y * T | otherwise: print “syntax error”; • The LL(1) parsing table: } int * + ( ) $ • we’ll use a LL(1) table and a parse stack T int Y ( E ) – the LL(1) table will replace the switch E T X T X – the parse stack will replace the call stack X + E ε ε Y * T ε ε ε
29 30 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
5 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors
• Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is – Consider the [E,*] entry int, use production E → T X – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y” – We’ll see later why this is so
31 32 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Using Parsing Tables LL(1) Parsing Algorithm
• Method similar to recursive descent, except initialize stack = and next (pointer to tokens) – For each non-terminal S repeat case stack of – We look at the next token a
33 34 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
LL(1) Parsing Example Constructing Parsing Tables
Stack Input Action • LL(1) languages are those defined by a parsing E $ int * int $ T X table for the LL(1) algorithm T X $ int * int $ int Y • No table entry can be multiply defined int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal • We want to generate parsing tables from CFG T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT
35 36 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
6 Constructing Predictive Parsing Tables Constructing Predictive Parsing Tables (Cont.)
• Consider the state S d* βAγ 2. b does not belong to an expansion of A – With b the next token – The expansion of A is empty and b belongs to an – Trying to match βbδ expansion of γ There are two possibilities: – Means that b can appear after A in a derivation of the form S d* βAbω 1. b belongs to an expansion of A –We say that b ∈ Follow(A) in this case •Any A d α can be used if b can start a string derived from α – What productions can we use in this case? In this case we say that b First(α) •Any A d α can be used if α can expand to ε •We say that ε First(A) in this case Or… 37 38 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
First Sets. Example
• Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Computing First, Follow sets • First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * }
40 Prof. Bodik CS 164 Lecture 6
Computing First Sets Follow Sets. Example
Definition First(X) = { b | X →* bα} ∪ {ε | X →* ε} • Recall the grammar 1. First(b) = { b } E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε
2. For all productions X → A1 …An • Follow sets • Add First(A ) – {ε} to First(X). Stop if ε∉First(A ) 1 1 Follow( + ) = { int, ( } Follow( * ) = { int, ( } • Add First(A2) – {ε} to First(X). Stop if ε∉First(A2) • … Follow( ( ) = { int, ( } Follow( E ) = {), $} • Add First(A ) – {ε} to First(X). Stop if ε∉First(A ) n n Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} • Add ε to First(X) Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} 3. Repeat step 2 until no First set grows Follow( int) = {*, +, ) , $}
41 42 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
7 Computing Follow Sets
Definition Follow(X) = { b | S →* β X b δ } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) (if S is the start non-terminal)
Constructing the LL(1) parsing table 3. For all productions Y → …X A1 …An • Add First(A1) – {ε} to Follow(X). Stop if ε∉First(A1)
• Add First(A2) – {ε} to Follow(X). Stop if ε∉First(A2) • …
• Add First(An) – {ε} to Follow(X). Stop if ε∉First(An) • Add Follow(Y) to Follow(X)
4. Repeat step 3 until no Follow set grows
43 Prof. Bodik CS 164 Lecture 6
Constructing LL(1) Parsing Tables Constructing LL(1) Tables. Example
• Construct a parsing table T for CFG G • Recall the grammar E → T X X → + E | ε • For each production A →αin G do: T → ( E ) | int Y Y → * T | ε – For each terminal b ∈ First(α) do • Where in the line of Y we put Y → * T ? •T[A, b] = α – In the lines of First( *T) = { * } –If α d* ε, for each b ∈ Follow(A) do •T[A, b] = α • Where in the line of Y we put Y d ε ? –If α d* ε and $ ∈ Follow(A) do – In the lines of Follow(Y) = { $, +, ) } •T[A, $] = α
45 46 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Notes on LL(1) Parsing Tables
• If any entry is multiply defined then G is not LL(1) –If G is ambiguous – If G is left recursive Notes and Review – If G is not left-factored – And in other cases as well • Most programming language grammars are not LL(1) • There are tools that build LL(1) tables
47 Prof. Bodik CS 164 Lecture 6
8 Top-Down Parsing. Review Top-Down Parsing. Review
• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal E E • The leaves at any point T E T E form a string βAγ + + – β contains only terminals T – The input string is βbδ int * –The prefix β matches –The next token is b int * int + int int * int + int 49 50 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Top-Down Parsing. Review Top-Down Parsing. Review
• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal E E • The leaves at any point • The leaves at any point T E form a string βAγ T E form a string βAγ + + – β contains only terminals – β contains only terminals T T – The input string is βbδ T T – The input string is βbδ int * int * –The prefix β matches –The prefix β matches int –The next token is b int int –The next token is b int * int + int int * int + int 51 52 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6
Predictive Parsing. Review.
• A predictive parser is described by a table – For each non-terminal A and for each token b we specify a production A d α – When trying to expand A we use A d α if b follows next
• Once we have the table – The parsing algorithm is simple and fast – No backtracking is necessary
53 Prof. Bodik CS 164 Lecture 6
9