Outline
• Review LL parsing Introduction to Bottom-Up Parsing • Shift-reduce parsing
• The LR parsing algorithm
• Constructing LR parsing tables
Compiler Design 1 (2011) 2
Top-Down Parsing: Review Top-Down Parsing: Review
• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal
E E • The leaves at any point T E T E form a string βAγ + + – β contains only terminals T – The input string is βbδ int * – The prefix β matches – The next token is b int * int + int int * int + int
Compiler Design 1 (2011) 3 Compiler Design 1 (2011) 4 Top-Down Parsing: Review Top-Down Parsing: Review
• Top-down parsing expands a parse tree from • Top-down parsing expands a parse tree from the start symbol to the leaves the start symbol to the leaves – Always expand the leftmost non-terminal – Always expand the leftmost non-terminal
E E • The leaves at any point • The leaves at any point T E form a string βAγ T E form a string βAγ + + – β contains only terminals – β contains only terminals T T T – The input string is βbδ T – The input string is βbδ int * int * – The prefix β matches – The prefix β matches int – The next token is b int int – The next token is b int * int + int int * int + int
Compiler Design 1 (2011) 5 Compiler Design 1 (2011) 6
Predictive Parsing: Review Constructing Predictive Parsing Tables
• A predictive parser is described by a table 1. Consider the state S →* βAγ
– For each non-terminal A and for each token b we – With b the next token specify a production A →α – Trying to match βbδ – When trying to expand A we use A →αif b follows There are two possibilities: next • b belongs to an expansion of A
• Once we have the table • Any A →αcan be used if b can start a string derived from α – The parsing algorithm is simple and fast ∈ α • We say that b First( ) – No backtracking is necessary
Or…
Compiler Design 1 (2011) 7 Compiler Design 1 (2011) 8 Constructing Predictive Parsing Tables (Cont.) Computing First Sets
2. b does not belong to an expansion of A Definition – The expansion of A is empty and b belongs to an First(X) = { b | X →* bα } ∪ { ε | X →* ε } expansion of γ Algorithm sketch – Means that b can appear after A in a derivation of the form S →* βAb ω 1. First(b) = { b }
– We say that b ∈ Follow(A) in this case 2. ε∈First(X) if X →εis a production
3. ε∈First(X) if X → A1 …An – What productions can we use in this case? and ε∈First(Ai) for 1 ≤ i ≤ n
• Any A → α can be used if α can expand to ε 4. First(α) ⊆ First(X) if X → A1 …An α • We say that ε ∈ First(A) in this case and ε∈First(A ) for 1 ≤ i ≤ n i
Compiler Design 1 (2011) 9 Compiler Design 1 (2011) 10
First Sets: Example Computing Follow Sets
• Recall the grammar • Definition E → T X X → + E | ε Follow(X) = { b | S →* β X b δ } T → ( E ) | int Y Y → * T | ε • First sets • Intuition
First( ( ) = { ( } First( T ) = {int, ( } – If X → A B then First(B) ⊆ Follow(A)
First( ) ) = { ) } First( E ) = {int, ( } and Follow(X) ⊆ Follow(B) First( int) = { int } First( X ) = { +, ε } – Also if B →* ε then Follow(X) ⊆ Follow(A)
First( + ) = { + } First( Y ) = { *, ε } – If S is the start symbol then $ ∈ Follow(S)
First( * ) = { * }
Compiler Design 1 (2011) 11 Compiler Design 1 (2011) 12 Computing Follow Sets (Cont.) Follow Sets: Example
Algorithm sketch • Recall the grammar 1. $ ∈ Follow(S) E → T X X → + E | ε
2. First( β) - {ε} ⊆ Follow(X) T → ( E ) | int Y Y → * T | ε – For each production A →αX β • Follow sets
3. Follow(A) ⊆ Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( }
– For each production A →αX β where ε∈First(β) Follow( ( ) = { int, ( } Follow( E ) = { ), $ } Follow( X ) = { $, ) } Follow( T ) = { +, ) , $ } Follow( ) ) = { + , ) , $ } Follow( Y ) = { +, ) , $ } Follow( int) = { *, +, ) , $ }
Compiler Design 1 (2011) 13 Compiler Design 1 (2011) 14
Constructing LL(1) Parsing Tables Constructing LL(1) Tables: Example
• Construct a parsing table T for CFG G • Recall the grammar E → T X X → + E | ε • For each production A →αin G do: T → ( E ) | int Y Y → * T | ε – For each terminal b ∈ First(α) do • T[A, b] = α • Where in the line of Y we put Y →* T ?
– If ε∈First(α), for each b ∈ Follow(A) do – In the lines of First( *T) = { * } • T[A, b] = α
– If ε∈First(α) and $ ∈ Follow(A) do • Where in the line of Y we put Y →ε ? α • T[A, $] = – In the lines of Follow(Y) = { $, +, ) }
Compiler Design 1 (2011) 15 Compiler Design 1 (2011) 16 Notes on LL(1) Parsing Tables
• If any entry is multiply defined then G is not LL(1) Bottom Up Parsing – If G is ambiguous – If G is left recursive – If G is not left-factored – And in other cases as well
• For some grammars there is a simple parsing strategy: Predictive parsing • Most programming language grammars are not LL(1) • Thus, we need more powerful parsing strategies
Compiler Design 1 (2011) 17
Bottom-Up Parsing An Introductory Example
• Bottom-up parsing is more general than top- • LR parsers don’t need left-factored grammars down parsing and can also handle left-recursive grammars – And just as efficient – Builds on ideas in top-down parsing • Consider the following grammar: – Preferred method in practice E → E + ( E ) | int • Also called LR parsing – L means that tokens are read left to right – Why is this not LL(1)?
– R means that it constructs a rightmost derivation ! • Consider the string: int + ( int ) + ( int )
Compiler Design 1 (2011) 19 Compiler Design 1 (2011) 20 The Idea A Bottom-up Parse in Detail (1)
• LR parsing reduces a string to the start symbol by inverting productions: int + (int) + (int)
str w input string of terminals repeat – Identify β in str such that A →βis a production (i.e., str = αβγ) – Replace β by A in str (i.e., str w = α A γ) until str = S (the start symbol) OR all possibilities are exhausted int + ( int ) + ( int )
Compiler Design 1 (2011) 21 Compiler Design 1 (2011) 22
A Bottom-up Parse in Detail (2) A Bottom-up Parse in Detail (3)
int + (int) + (int) int + (int) + (int) E + (int) + (int) E + (int) + (int) E + (E) + (int)
E E E
int + ( int ) + ( int ) int + ( int ) + ( int )
Compiler Design 1 (2011) 23 Compiler Design 1 (2011) 24 A Bottom-up Parse in Detail (4) A Bottom-up Parse in Detail (5)
int + (int) + (int) int + (int) + (int) E + (int) + (int) E + (int) + (int) E + (E) + (int) E + (E) + (int)
E + (int) E E + (int) E E + (E)
E E E E E
int + ( int ) + ( int ) int + ( int ) + ( int )
Compiler Design 1 (2011) 25 Compiler Design 1 (2011) 26
A Bottom-up Parse in Detail (6) Important Fact #1
E int + (int) + (int) Important Fact #1 about bottom-up parsing: E + (int) + (int) E + (E) + (int) An LR parser traces a rightmost derivation in E + (int) E E + (E) reverse E
A rightmost E E E derivation in reverse int + ( int ) + ( int )
Compiler Design 1 (2011) 27 Compiler Design 1 (2011) 28 Where Do Reductions Happen Notation
Important Fact #1 has an interesting • Idea: Split string into two substrings consequence: – Right substring is as yet unexamined by parsing – Let αβγ be a step of a bottom-up parse (a string of terminals) – Assume the next reduction is by using A→β – Left substring has terminals and non-terminals – Then γ is a string of terminals • The dividing point is marked by a I
Why? Because αAγ→αβγis a step in a right- – The I is not part of the string
most derivation • Initially, all input is unexamined: Ix1x2 . . . xn
Compiler Design 1 (2011) 29 Compiler Design 1 (2011) 30
Shift-Reduce Parsing Shift
Bottom-up parsing uses only two kinds of actions: Shift: Move I one place to the right – Shifts a terminal to the left string
Shift E + (I int ) ⇒ E + (int I )
Reduce In general: ABC I xyz ⇒ ABCx I yz
Compiler Design 1 (2011) 31 Compiler Design 1 (2011) 32 Reduce Shift-Reduce Example
Reduce: Apply an inverse production at the right E → E + ( E ) | int I int + (int) + (int)$ shift end of the left string – If E → E + ( E ) is a production, then
E + ( E + ( E ) I ) ⇒ E + ( E I )
In general, given A → xy, then: Cbxy I ijk ⇒ CbA I ijk
int + ( int ) + ( int )
Compiler Design 1 (2011) 33
Shift-Reduce Example Shift-Reduce Example
E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times
E
int + ( int ) + ( int ) int + ( int ) + ( int ) Shift-Reduce Example Shift-Reduce Example
E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift
E E E
int + ( int ) + ( int ) int + ( int ) + ( int )
Shift-Reduce Example Shift-Reduce Example
E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times
E E E E
int + ( int ) + ( int ) int + ( int ) + ( int ) Shift-Reduce Example Shift-Reduce Example
E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times E I + (int)$ shift 3 times E + (int I )$ reduce E → int E + (int I )$ reduce E → int E + (E I )$ shift E E E E E
int + ( int ) + ( int ) int + ( int ) + ( int )
Shift-Reduce Example Shift-Reduce Example
E → E + ( E ) | int E → E + ( E ) | int I int + (int) + (int)$ shift I int + (int) + (int)$ shift E int I + (int) + (int)$ reduce E → int int I + (int) + (int)$ reduce E → int E I + (int) + (int)$ shift 3 times E I + (int) + (int)$ shift 3 times E + (int I ) + (int)$ reduce E → int E + (int I ) + (int)$ reduce E → int E + (E I ) + (int)$ shift E E + (E I ) + (int)$ shift E E + (E) I + (int)$ reduce E → E + (E) E + (E) I + (int)$ reduce E → E + (E) E I + (int)$ shift 3 times E I + (int)$ shift 3 times E + (int I )$ reduce E → int E + (int I )$ reduce E → int E + (E I )$ shift E + (E I )$ shift E E E E E + (E) I $ reduce E → E + (E) E E + (E) I $ reduce E → E + (E) E E I $ accept int + ( int ) + ( int ) int + ( int ) + ( int ) The Stack Key Question: To Shift or to Reduce?
• Left string can be implemented by a stack Idea: use a finite automaton (DFA) to decide – Top of the stack is the I when to shift or reduce – The input is the stack
• Shift pushes a terminal on the stack – The language consists of terminals and non-terminals
• We run the DFA on the stack and we examine • Reduce pops 0 or more symbols off of the the resulting state X and the token tok after I stack (production RHS) and pushes a non- – If X has a transition labeled tok then shift terminal on the stack (production LHS) – If X is labeled with “A →βon tok”then reduce
Compiler Design 1 (2011) 45 Compiler Design 1 (2011) 46
LR(1) Parsing: An Example Representing the DFA
int 0 1 I int + (int) + (int)$ shift • Parsers represent the DFA as a 2D table E E → int int I + (int) + (int)$ E → int – Recall table-driven lexical analysis ( on $, + E I + (int ) + (int)$ shift(x3) 2 + 3 4 • Lines correspond to DFA states accept E + (int I ) + (int)$ E → int E int • Columns correspond to terminals and non- on $ E + (E I ) + (int)$ shift terminals ) E → int E + (E) I + (int)$ E → E+(E) 7 6 5 on ), + E I + (int)$ shift (x3) • Typically columns are split into: E → E + (E) on $, + + int E + (int I )$ E → int – Those for terminals: action table – Those for non-terminals: goto table ( E + (E I )$ shift 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + Compiler Design 1 (2011) 48 Representing the DFA: Example The LR Parsing Algorithm
• The table for a fragment of our DFA: • After a shift or reduce action we rerun the DFA on the entire stack – This is wasteful, since most of the work is repeated ( int + ( ) $ E 3 4 … int E 3 s4 • Remember for each stack element on which
4 s5 g6 state it brings the DFA 6 5 5 r → r → E → int E int E int 6 s8 s7 • LR parser maintains a stack ) on ), + 7 〈 〉 〈 〉 rE →E+(E) rE → E+(E) sym1, state1 . . . symn, staten state is the final state of the DFA on sym …sy m 7 … k 1 k E → E + (E) on $, + Compiler Design 1 (2011) 49 Compiler Design 1 (2011) 50
The LR Parsing Algorithm LR Parsers
Let I = w$ be initial input • Can be used to parse more grammars than LL Let j = 0 Let DFA state 0 be the start state Let stack = 〈 dummy, 0 〉 • Most programming languages grammars are LR repeat case action[top_state(stack), I[j]] of • LR Parsers can be described as a simple table shift k: push 〈 I[j++], k 〉 reduce X → A: • There are tools for building the table pop |A| pairs, push 〈X, Goto[top_state(stack), X]〉 accept: halt normally • How is the table constructed? error: halt and report error
Compiler Design 1 (2011) 51 Compiler Design 1 (2011) 52