LR Parsing Lecture 7
January 30, 2018 Midterm 1 Next Week
É in-class exam É One sheet letter paper, two sides É Some review questions online É Things to know É Cool Syntax É Compiler/language processor steps É Regular Languages É NFA/DFAs É Regular Expressions É Parsing É Context-Free Grammars
Compiler Construction 2/41 From Last Week
É Top-down Parsing strategies É Tells us how to get from the start symbol of a grammar to a sequence of terminals following a sequence of productions
Compiler Construction 3/41 Top Down Parsing
S S a B c B → C x B B → ε C → d →| a B c Input string: “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B B → ε C → d →| a B c Input string: “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B → B ε B C → d →| a B c C Input string: “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B → B ε B C → d →| a B c C Input string: “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x c
Compiler Construction 4/41 Top Down Parsing
S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x ε c
Compiler Construction 4/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E ( E )
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E ( E ) int
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E int
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E int *
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E int * T
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
T + E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T int
Compiler Construction 5/41 Top Down Parsing (2)
E T + E | T T → (E) | int | int T → ∗ Input: int * int
E int * T
int * int
Compiler Construction 5/41 Top-Down Summary
É Exhaust all grammatical rule options at each step É Notice significant backtracing! É Backtracing results from having multiple options for rules when considering tokens!
É If we restrict our grammar slightly, we can use LL(1) parsing
É Key idea: deterministically select grammar rule based on one token to eliminate backtracing
Compiler Construction 6/41 LL(1) Summary
É We want a parsing strategy that lets us parse an input string by considering only one token at a time
É Intuition: If input string x L(G), there must be a parse tree. We want to pick∈ a grammar rule one token at a time so we can quickly find if the string is in the language!
É Implementation: Parse table É Intuition: 2-d table indexed by ‘current’ non-terminal and ‘lookahead’ token. Each cell contains the production rule corresponding to the lookahead.
Compiler Construction 7/41 Parse Table Construction
É Constructing the Parse table requires a PREDICT set of every non-terminal in the grammar. É We need a way to know which rule must be used to successfully construct the next part of the parse tree
S a |→ b Consider! You know exactly which rule to take based on the first thing you see as part of the S rule.
Compiler Construction 8/41 Parse Table Construction S a A a →| b A A c c A →| b b A
É Same idea! We always start with S. You can see that the first non-terminal in the S rules tells you unambiguously which rule is used! É Similarly, after you consume one ‘a’, the next non-terminal you consider is A. You can decide which A rule to choose based on the first terminal in the A rules!
Compiler Construction 9/41 Parse Table Construction
S A a A → B b B → c B →| a A
É A bit trickier, but you can still decide which rule to take with only one terminal
Compiler Construction 10/41 Parse Table Construction
É Sounds like we need to know the first terminal that can appear as a result of expanding a non-terminal!
É This is called the FIRST set FIRST(A) is the set of terminals that can appear first in the expansion of the non-terminal A
É Note this means recursing through other nonterminals!
FIRST(X) = b X ∗ bα ε X ∗ ε É { | → } ∪ { | → }
Compiler Construction 11/41 Parse Table Construction
É That includes ε rules
S a A a A → b A →| ε
Compiler Construction 12/41 Parse Table Construction
É Recall we’re trying to make a parse table É Almost enough to know FIRST set of all non-terminals. The token will tell us which rule to take next.
É However, how do we know when the token should be an ε?
Compiler Construction 13/41 Parse Table Construction
S a A B a A → b B →| ε B c A d →| ε
If my input string is aba, after consuming the ‘a’ token, how do we know that the next ‘b’ token results in B ε? →
Compiler Construction 14/41 Parse Table Construction: FOLLOW
É We also need to know the terminals that could come after the previous non-terminal
FOLLOW(X) = b S ∗ βXbω É { | → } Computing FOLLOW set É Compute FIRST sets for all non-terminals É Add $ to FOLLOW(S) É start symbol always ends with end-of-input For all productions Y ...XA1...An É → É Add FIRST(Ai)- ε to FOLLOW(X). Stop if { } ε FIRST (Ai). 6∈ É Add FOLLOW(Y) to FOLLOW(X)
Compiler Construction 15/41 Example FOLLOW Set
E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ FOLLOW(“+”) = { int, ( } FOLLOW(“(”) = { int, ( } FOLLOW(X) = { $, ) } FOLLOW(Y) = { +, ), $ }
Compiler Construction 16/41 Back to Parsing Tables
É Recall: We want to build a LL(1) Parsing Table
For each production A α in G do: → For each terminal b FIRST(α) do É ∈ É T[A][b] = α If α ∗ ε, for each b FOLLOW(A) do É → ∈ É T[A][b] = α
Compiler Construction 17/41 Parsing Table
E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ Where do we put Y T ? → ∗ É Well, FIRST(*T) = {*}, thus column * of row Y gets *T
int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε
Compiler Construction 18/41 Parsing Table
E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ Where do we put Y ε? → É Well, FOLLOW(Y) = {$, +,)}, thus columns $, +, and ) in row Y get Y ε →
int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε
Compiler Construction 19/41 Another Parse Table Example
E TXR FR → → ∗ X + T X | ε |→ ε F i n t T F R→ | ( E ) → Production Prediction + * ( ) int $ E TX (, int X→ TX E + + X → ε $, ) X T → FR (, int) T R → FR * R R → ∗ε , $, ) F + F →int int → F (E) ( →
Compiler Construction 20/41 Another Parse Table Example
Production Prediction + * ( ) int $ E TX (, int X→ TX E TX TX + + X → ε $, ) X TX ε ε + T → FR (, int) T FR FR R → FR * R ε FR ε ε R → ∗ε , $, ) F ∗ E int + ( ) F →int int → F (E) ( →
Compiler Construction 21/41 Notes on LL(1) Parsing Tables
É If any entry is multiply defined then G is not LL(1)
É G is ambiguous É G is left-recursive É G is not left-factored
Compiler Construction 22/41 Ambiguity in parse tables
E E + TT F E → TF → i d T → T FF → (E) → ∗ → For the E productions, we need FIRST(T) = {(, id} and FIRST(E) = {(, id}
But now, which rule ( E E + T or E T ) gets put in → → T[E][(] and T[E][id]??
+ * ( ) id $ E ? ? T F
Compiler Construction 23/41 LR Parsing
É LL(1) parsing is simple and fast É However, what if we wanted more expressiveness in the grammar? É Enter LR Parsing, a bottom-up parsing approach É Equally efficient as LL(1) É Not quite as easy by hand É Preferred practical method (see bison/yacc)
Compiler Construction 24/41 LR Parsing
An LR Parser reads tokens from left to right and constructs a bottom-up rightmost derivation. LR parsers shift terminals and reduce input by application of productions in reverse. LR parsing is fast and easy, and uses a finite automaton (a.k.a. a parse table) augmented with a stack. LR works fine with grammars that are left-recursive or not left-factored.
Compiler Construction 25/41 LR Parsing
É LR parsers do not require left-factored grammars and can also handle left-recursive grammars Consider
E E + (E) int → | Can you see why this is not LL(1)?
Compiler Construction 26/41 LR Parsing
É Consider input string x É Loop É Identify β in x such that A β is a production → (i.e., x = αβγ) É Replace β by A in x (i.e., x becomes αAγ) É until x = S
Compiler Construction 27/41 LR Parsing Example
Rightmost derivation:
S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →
Compiler Construction 28/41 LR Parsing Example
Rightmost derivation:
S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →
Compiler Construction 28/41 LR Parsing Example
Rightmost derivation:
S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →
Compiler Construction 28/41 LR Parsing Example
Rightmost derivation:
S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →
Compiler Construction 28/41 LR Parsing Example
Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing Example
Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing Example
Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing Example
Recall E E + (E) int → | Input = int + (int) + (int) E E E + (E) E + (int) E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing Example
Recall E E + (E) int → | Input = int + (int) + (int) E E E + (E) E + (int) E E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing Example
Recall E E + (E) int E → | Input = int + (int) + (int) E E E + (E) E + (int) E E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )
Compiler Construction 29/41 LR Parsing
Important fact about LR parsing: An LR parser traces a rightmost derivation in reverse.
Compiler Construction 30/41 Notation
Idea: Split the string into two substrings
É Right substring (a string of terminals) is as yet unexamined by the parser É Left substring has terminals and non-terminals
The dividing point is marked by a • Initially, all input is new: x1x2x3...xn •
Compiler Construction 31/41 LR = shift-reduce Parsing
Bottom-up parsing uses only two actions: 1. Shift 2. Reduce
Compiler Construction 32/41 Shift
Shift: Move one place to the right • É Shifts a terminal to the left string
E + ( int ) • ⇒ E + ( int ) •
Compiler Construction 33/41 Reduce
Apply an inverse production at the right end of the left string If T E + (E) is a production, then → E + (E + (E) ) • E + ( T ) Reductions can • only happen here!
Compiler Construction 34/41 Shift-Reduce Example
int + (int) + (int)$ shift •
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int • →
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ •
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int • →
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → •
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) • →
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ •
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int • →
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → •
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → E + (E)• $ reduce E E + (E) • →
Compiler Construction 35/41 Shift-Reduce Example
int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → E + (E)• $ reduce E E + (E) E $ • accept → •
Compiler Construction 35/41 Stack The left string can be implemented with a stack Top of the stack is É • Shift É pushes a terminal onto the stack Reduce É pop 0 or more symbols from the stack (production RHS) É push a non-terminal on the stack (production LHS)
Compiler Construction 36/41 When to Shift/Reduce?
É Decide based on the left string (stack) É Idea: Use a DFA to decide when to shift or reduce
É The DFA input is the stack É DFA language consists of terminals and non-terminals
É We run the DFA on the stack and we examine the resulting state X and the token t after • É If X has a transition labeled t, then shift É If X is labeled with “A β on t”, then reduce →
Compiler Construction 37/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + ( int 3 4 5 accept on $ E int on ), + E →
) int + (int) + (int)$ shift 7 6 • E E + (E) on $, + → + ( 8 9 E +
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + ( int 3 4 5 accept on $ E int on ), + → E int + (int) + (int)$ shift ) • 7 6 int + (int) + (int)$ reduce E int E E + (E) on $, + • → → + ( 8 9 E +
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + ( int 3 4 5 accept on $ E int on ), + → int + (int) + (int)$ shift E • ) int + (int) + (int)$ reduce E int 7 6 • E (int) (int)$ → E E + (E) on $, + + + shift (3 times) → + • ( 8 9 E +
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + ( int 3 4 5 accept on $ int + (int) + (int)$ shift E int on ), + → • E int + (int) + (int)$ reduce E int ) • → 7 6 E + (int) + (int)$ shift (3 times) E E + (E) on $, + • → + E + (int ) + (int)$ reduce E int ( 8 9 • → E +
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + ( int 3 4 5 int + (int) + (int)$ shift accept on $ • E int on ), + → int + (int) + (int)$ reduce E int E • → ) E + (int) + (int)$ shift (3 times) 7 6 • E (int ) (int)$ E E + (E) on $, + + + reduce E int → + • → ( E + (E ) + (int)$ shift 8 9 • E +
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E
2 + int (int) (int)$ ( int + + shift 3 4 5 • accept on $ int + (int) + (int)$ reduce E int E int on ), + → • → E E + (int) + (int)$ shift (3 times) ) • 7 6 E + (int ) + (int)$ reduce E int E E + (E) on $, + • → → + E + (E ) + (int)$ shift ( • 8 9 E + (E) + (int)$ reduce E E + (E) E + • →
10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E int + (int) + (int)$ shift 2 + ( int • 3 4 5 int + (int) + (int)$ reduce E int accept on $ • E int on ), + → → E + (int) + (int)$ shift (3 times) E • ) E + (int ) + (int)$ reduce E int 7 6 • E (E ) (int)$ → E E + (E) on $, + + + shift → + • ( E + (E) + (int)$ reduce E E + (E) 8 9 • → E + E + (int)$ shift (3 times) • 10 11 E E+(E) on ), + ) →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → E int + (int) + (int)$ shift • 2 + int (int) (int)$ ( int + + reduce E int 3 4 5 • → accept on $ E + (int) + (int)$ shift (3 times) E int on ), + → • E E + (int ) + (int)$ reduce E int ) • → 7 6 E + (E ) + (int)$ shift E E + (E) on $, + • → + E + (E) + (int)$ reduce E E + (E) ( • → 8 9 E + (int)$ shift (3 times) E + • E + (int )$ reduce E int 10 11 E E+(E) on ), + • ) → →
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + → int + (int) + (int)$ shift E • int + (int) + (int)$ reduce E int 2 + ( int • → 3 4 5 E + (int) + (int)$ shift (3 times) accept on $ • E int on ), + → E + (int ) + (int)$ reduce E int E • → ) E + (E ) + (int)$ shift 7 6 • E (E) (int)$ E E + (E) on $, + + + reduce E E + (E) → + • → ( E + (int)$ shift (3 times) 8 9 • E + E + (int )$ reduce E int • → 10 11 E E+(E) on ), + E + (E )$ shift ) → •
Compiler Construction 38/41 LR(1) Parsing Example
int 0 1 E int on $, + int + (int) + (int)$ shift → • E int + (int) + (int)$ reduce E int • → 2 + E (int) (int)$ shift (3 times) ( int + + 3 4 5 • accept on $ E + (int ) + (int)$ reduce E int E int on ), + → • → E E + (E ) + (int)$ shift ) • 7 6 E + (E) + (int)$ reduce E E + (E) E E + (E) on $, + • → → + E + (int)$ shift (3 times) ( • 8 9 E + (int )$ reduce E int E + • → E + (E )$ shift 10 11 E E+(E) on ), + • ) → E + (E) $ reduce E E + (E) • →
Compiler Construction 38/41 LR(1) Parsing Example
int (int) (int)$ int + + shift 0 1 E int on $, + → • E int + (int) + (int)$ reduce E int • → E + (int) + (int)$ shift (3 times) 2 + ( int • 3 4 5 E + (int ) + (int)$ reduce E int accept on $ • E int on ), + → → E + (E ) + (int)$ shift E • ) E + (E) + (int)$ reduce E E + (E) 7 6 • → E E + (E) on $, + E + (int)$ shift (3 times) → + • ( E + (int )$ reduce E int 8 9 • E → + E + (E )$ shift • 10 11 E E+(E) on ), + E + (E) $ reduce E E + (E) ) → • → E $ accept •
Compiler Construction 38/41 Parsing Tables
That’s right! Represent that DFA as a table
É Parsers represent the DFA as a 2D table É Rows correspond to DFA states É Columns correspond to terminals/non-terminals É Cells contain actions É Terminals shift or reduce É Non-temrinals goto subsequent DFA states
É Critically Restart DFA with stack as input every time you reduce
Compiler Construction 39/41 Parsing Tables
int + ( ) $ E
0 s1 g2 1 rE int rE int 2 s3→ Accept→ 3 s4 4 s5 g6 5 r r E int E int int 6 s8→ s7 → 0 1 E int on $, + 7 r r → E E+(E) E E+(E) E ... → → 2 + ( int 3 4 5 accept on $ E int on ), + E →
) 7 6
E E + (E) on $, + → + ( 8 9 E +
10 11 E E+(E) on ), + ) →
Compiler Construction 40/41 Parsing Tables
int + ( ) $ E
0 s1 g2 1 rE int rE int 2 s3→ Accept→ 3 s4 4 s5 g6 5 r r E int E int int 6 s8→ s7 → 0 1 E int on $, + 7 r r → E E+(E) E E+(E) E ... → → 2 + ( int 3 4 5 accept on $ E int on ), + E →
) 7 6
E E + (E) on $, + → + ( 8 9 E +
10 11 E E+(E) on ), + ) →
Compiler Construction 40/41 Next time
É Parser generators construct the parsing DFA/table for you É For PA3, you use a yacc/bison-based parser generator to implement the COOL grammar!
É What about the abstract syntax tree
Compiler Construction 41/41