<<

Administrativia

• WA1 due on Thu • PA2 in a week Building a Parser III • Slides on the web site – I do my best to have slides ready and posted by CS164 the end of the preceding logical day 3:30-5:00 TT – yesterday, my best was not good enough 10 Evans

Prof. Bodik CS 164 Lecture 6 1 2 Prof. Bodik CS 164 Lecture 6

Overview Review: for arithmetic expressions

• Finish • Simple arithmetic expressions: – when it breaks down and how to fix it E d n | id | ( E ) | E + E | E * E – eliminating – reordering productions • Some elements of this language: • Predictive parsers (aka LL(1) parsers) –id – computing FIRST, FOLLOW –n – table-driven, stack-manipulating version of the –( n ) parser –n + id –id * ( id + id )

3 4 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Review: derivation Recursive Descent

Grammar: E d n | id | ( E ) | E + E | E * E • Consider the grammar •a derivation: E → T + E | T E rewrite E with ( E ) T → int | int * T | ( E ) ( E ) rewrite E with n • Token stream is: int * int ( n ) this is the final string of terminals 5 2 • another derivation (written more concisely): • Start with top-level non-terminal E E d ( E ) d ( E * E ) d ( E + E * E ) d ( n + E * E ) d ( n + id * E ) d ( n + id * id ) • this is left-most derivation (remember it) • Try the rules for E in order – always expand the left-most non-terminal – can you guess what’s right-most derivation?

5 6 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

1 Recursive-Descent Parsing When Recursive Descent Does Not Work

• Parsing: given a string of tokens t1 t2 ... tn, find • Consider a production S → S a: its – In the process of parsing S we try the above rule • Recursive-descent parsing: Try all the – What goes wrong? productions exhaustively •A fix? – At a given moment the fringe of the parse tree is: – S must have a non-recursive production, say S → b t1 t2 … tk A … – expand this production before you expand S → S a – Try all the productions for A: if A d BC is a • Problems remain production, the new fringe is t t … t B C … 1 2 k – performance (steps needed to parse “baaaaa”) – Backtrack when the fringe doesn’t match the string – termination (parse the error input “c”) – Stop when there are no more non-terminals

7 8 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Solutions A Recursive Descent Parser (1)

• First, restrict • Define boolean functions that check the token – backtrack just enough to produce a sufficiently string for a match of powerful r.d. parser – A given token terminal • Second, eliminate left recursion bool term(TOKEN tok) { return in[next++] == tok; } – transformation that produces a different grammar – A given production of S (the nth)

– the new grammar generates same strings bool Sn() { … } – but does it give us same parse tree as old grammar? – Any production of S: bool S() { … } • Let’s see the restricted r.d. parser first • These functions advance next 9 10 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

A Recursive Descent Parser (2) A Recursive Descent Parser (3)

• For production E → T + E • Functions for non-terminal T bool T () { return term(OPEN) && E() && term(CLOSE); } bool E1() { return T() && term(PLUS) && E(); } 1 • For production E → T bool T2() { return term(INT) && term(TIMES) && T(); } bool T3() { return term(INT); } bool E2() { return T(); }

• For all productions of E (with backtracking) bool T() { bool E() { int save = next; int save = next; return (next = save, T1()) return (next = save, E ()) 1 || (next = save, T2()) || (next = save, E ()); } 2 || (next = save, T3()); } 11 12 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

2 Recursive Descent Parsing. Notes. Now back to left-recursive

• To start the parser • Does this style of r.d. parser work for our – Initialize next to point to first token left-? – Invoke E() – the grammar: S → S a | b • Notice how this simulates our backtracking example from lecture – what happens when S → S a is expanded first? • Easy to implement by hand – what happens when S → b is expanded first? • Predictive parsing is more efficient

13 14 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Left-recursive grammars Elimination of Left Recursion

• A left-recursive grammar has a non-terminal S • Consider the left-recursive grammar S →+ Sα for some α S → S α | β • Recursive descent does not work in such cases – It goes into an ∞ loop •Sgenerates all strings starting with a β and •Notes: followed by a number of α – α: a shorthand for any string of terminals, non- terminals • Can rewrite using right-recursion →+ –symbol is a shorthand for “can be derived in one S →βS’ or more steps”: •S →+ Sα is same as S → … → Sα S’ →αS’ | ε

15 16 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Elimination of Left-Recursion. Example More Elimination of Left-Recursion

• Consider the grammar • In general S d 1 | S 0 ( β = 1 and α = 0 ) S → S α1 | … | S αn | β1 | … | βm • All strings derived from S start with one of can be rewritten as β1,…,βm and continue with several instances of S d 1 S’ α1,…,αn S’ d 0 S’ | ε •Rewrite as

S →β1 S’ | … | βm S’

S’ →α1 S’ | … | αn S’ | ε

17 18 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

3 General Left Recursion Summary of Recursive Descent

• The grammar • simple parsing strategy – left-recursion must be eliminated first S → A α | δ – … but that can be done automatically A → S β • unpopular because of backtracking is also left-recursive because – thought to be too inefficient + – in practice, backtracking is (sufficiently) eliminated by S → S βα restricting the grammar • so, it’s good enough for small languages – careful, though: order of productions important even after • This left-recursion can also be eliminated left-recursion eliminated • See [ASU], Section 4.3 for general algorithm – try to reverse the order of E → T + E | T – what goes wrong?

19 20 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Motivation

• Wouldn’t it be nice if – the r.d. parser just knew which production to expand next? –Idea: replace Predictive parsers return (next = save, E1()) || (next = save, E2()); } –with switch ( something ) { case L1: return E1(); case L2: return E2(); otherwise: print “ error”; } – what’s “something”, L1, L2? • the parser will do lookahead (look at next token)

22 Prof. Bodik CS 164 Lecture 6

Predictive Parsers LL(1) Languages

• Like recursive-descent but parser can • In recursive-descent, for each non-terminal “predict” which production to use and input token there may be a choice of – By looking at the next few tokens production –No backtracking • LL(1) means that for each non-terminal and • Predictive parsers accept LL(k) grammars token there is only one production that could – L means “left-to-right” scan of input lead to success – L means “leftmost derivation” • Can be specified as a 2D table – k means “predict based on k tokens of lookahead” – One dimension for current non-terminal to expand • In practice, LL(1) is used – One dimension for next token – A table entry contains one production 23 24 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

4 Predictive Parsing and Left Factoring

• Recall the grammar E → T + E | T T → int | int * T | ( E ) Left factoring • Impossible to predict because – For T two productions start with int – For E it is not clear how to predict

• A grammar must be left-factored before use predictive parsing 26 Prof. Bodik CS 164 Lecture 6

Left-Factoring Example

• Recall the grammar E → T + E | T T → int | int * T | ( E ) LL(1) parser (details) • Factor out common prefixes of productions E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε

27 Prof. Bodik CS 164 Lecture 6

LL(1) parser LL(1) Parsing Table Example

• to simplify things, instead of • Left-factored grammar switch ( something ) { E → T X X → + E | ε case L1: return E1(); → → ε case L2: return E2(); T ( E ) | int Y Y * T | otherwise: print “syntax error”; • The LL(1) parsing table: } int * + ( ) $ • we’ll use a LL(1) table and a parse stack T int Y ( E ) – the LL(1) table will replace the switch E T X T X – the parse stack will replace the call stack X + E ε ε Y * T ε ε ε

29 30 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

5 LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors

• Consider the [E, int] entry • Blank entries indicate error situations – “When current non-terminal is E and next input is – Consider the [E,*] entry int, use production E → T X – “There is no way to derive a string starting with * – This production can generate an int in the first from non-terminal E” place • Consider the [Y,+] entry – “When current non-terminal is Y and current token is +, get rid of Y” – We’ll see later why this is so

31 32 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Using Parsing Tables LL(1) Parsing Algorithm

• Method similar to recursive descent, except initialize stack = and next (pointer to tokens) – For each non-terminal S repeat case stack of – We look at the next token a : if T[X,*next] = Y1…Yn – And choose the production shown at [S,a] then stack ← ; • We use a stack to keep track of pending non- else error (); terminals : if t == *next ++ then stack ← ; • We reject when we encounter an error state else error (); • We accept when we encounter end-of-input until stack == < >

33 34 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

LL(1) Parsing Example Constructing Parsing Tables

Stack Input Action • LL(1) languages are those defined by a parsing E $ int * int $ T X table for the LL(1) algorithm T X $ int * int $ int Y • No table entry can be multiply defined int Y X $ int * int $ terminal Y X $ * int $ * T * T X $ * int $ terminal • We want to generate parsing tables from CFG T X $ int $ int Y int Y X $ int $ terminal Y X $ $ ε X $ $ ε $ $ ACCEPT

35 36 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

6 Constructing Predictive Parsing Tables Constructing Predictive Parsing Tables (Cont.)

• Consider the state S d* βAγ 2. b does not belong to an expansion of A – With b the next token – The expansion of A is empty and b belongs to an – Trying to match βbδ expansion of γ There are two possibilities: – Means that b can appear after A in a derivation of the form S d* βAbω 1. b belongs to an expansion of A –We say that b ∈ Follow(A) in this case •Any A d α can be used if b can start a string derived from α – What productions can we use in this case? In this case we say that b  First(α) •Any A d α can be used if α can expand to ε •We say that ε  First(A) in this case Or… 37 38 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

First Sets. Example

• Recall the grammar E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε Computing First, Follow sets • First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, ε } First( + ) = { + } First( Y ) = {*, ε } First( * ) = { * }

40 Prof. Bodik CS 164 Lecture 6

Computing First Sets Follow Sets. Example

Definition First(X) = { b | X →* bα} ∪ {ε | X →* ε} • Recall the grammar 1. First(b) = { b } E → T X X → + E | ε T → ( E ) | int Y Y → * T | ε

2. For all productions X → A1 …An • Follow sets • Add First(A ) – {ε} to First(X). Stop if ε∉First(A ) 1 1 Follow( + ) = { int, ( } Follow( * ) = { int, ( } • Add First(A2) – {ε} to First(X). Stop if ε∉First(A2) • … Follow( ( ) = { int, ( } Follow( E ) = {), $} • Add First(A ) – {ε} to First(X). Stop if ε∉First(A ) n n Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} • Add ε to First(X) Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} 3. Repeat step 2 until no First set grows Follow( int) = {*, +, ) , $}

41 42 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

7 Computing Follow Sets

Definition Follow(X) = { b | S →* β X b δ } 1. Compute the First sets for all non-terminals first 2. Add $ to Follow(S) (if S is the start non-terminal)

Constructing the LL(1) parsing table 3. For all productions Y → …X A1 …An • Add First(A1) – {ε} to Follow(X). Stop if ε∉First(A1)

• Add First(A2) – {ε} to Follow(X). Stop if ε∉First(A2) • …

• Add First(An) – {ε} to Follow(X). Stop if ε∉First(An) • Add Follow(Y) to Follow(X)

4. Repeat step 3 until no Follow set grows

43 Prof. Bodik CS 164 Lecture 6

Constructing LL(1) Parsing Tables Constructing LL(1) Tables. Example

• Construct a parsing table T for CFG G • Recall the grammar E → T X X → + E | ε • For each production A →αin G do: T → ( E ) | int Y Y → * T | ε – For each terminal b ∈ First(α) do • Where in the line of Y we put Y → * T ? •T[A, b] = α – In the lines of First( *T) = { * } –If α d* ε, for each b ∈ Follow(A) do •T[A, b] = α • Where in the line of Y we put Y d ε ? –If α d* ε and $ ∈ Follow(A) do – In the lines of Follow(Y) = { $, +, ) } •T[A, $] = α

45 46 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Notes on LL(1) Parsing Tables

• If any entry is multiply defined then G is not LL(1) –If G is ambiguous – If G is left recursive Notes and Review – If G is not left-factored – And in other cases as well • Most grammars are not LL(1) • There are tools that build LL(1) tables

47 Prof. Bodik CS 164 Lecture 6

8 Top-Down Parsing. Review Top-Down Parsing. Review

• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal E E • The leaves at any point T E T E form a string βAγ + + – β contains only terminals T – The input string is βbδ int * –The prefix β matches –The next token is b int * int + int int * int + int 49 50 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Top-Down Parsing. Review Top-Down Parsing. Review

• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal E E • The leaves at any point • The leaves at any point T E form a string βAγ T E form a string βAγ + + – β contains only terminals – β contains only terminals T T – The input string is βbδ T T – The input string is βbδ int * int * –The prefix β matches –The prefix β matches int –The next token is b int int –The next token is b int * int + int int * int + int 51 52 Prof. Bodik CS 164 Lecture 6 Prof. Bodik CS 164 Lecture 6

Predictive Parsing. Review.

• A predictive parser is described by a table – For each non-terminal A and for each token b we specify a production A d α – When trying to expand A we use A d α if b follows next

• Once we have the table – The parsing algorithm is simple and fast – No backtracking is necessary

53 Prof. Bodik CS 164 Lecture 6

9