LR Lecture 7

January 30, 2018 Midterm 1 Next Week

É in-class exam É One sheet letter paper, two sides É Some review questions online É Things to know É Cool Syntax É Compiler/language processor steps É Regular Languages É NFA/DFAs É Regular Expressions É Parsing É Context-Free Grammars

Compiler Construction 2/41 From Last Week

É Top-down Parsing strategies É Tells us how to get from the start symbol of a grammar to a sequence of terminals following a sequence of productions

Compiler Construction 3/41 Top Down Parsing

S S a B c B → C x B B → ε C → d →| a B c Input string: “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B B → ε C → d →| a B c Input string: “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B → B ε B C → d →| a B c C Input string: “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B → B ε B C → d →| a B c C Input string: “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x c

Compiler Construction 4/41 Top Down Parsing

S S a B c B B → C x B → B ε B C → d →| a B c C Input string: C B “adxdxc” a d x d x ε c

Compiler Construction 4/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E ( E )

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E ( E ) int

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E int

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E int *

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E int * T

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

T + E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T int

Compiler Construction 5/41 Top Down Parsing (2)

E T + E | T T → (E) | int | int T → ∗ Input: int * int

E int * T

int * int

Compiler Construction 5/41 Top-Down Summary

É Exhaust all grammatical rule options at each step É Notice significant backtracing! É Backtracing results from having multiple options for rules when considering tokens!

É If we restrict our grammar slightly, we can use LL(1) parsing

É Key idea: deterministically select grammar rule based on one token to eliminate backtracing

Compiler Construction 6/41 LL(1) Summary

É We want a parsing strategy that lets us parse an input string by considering only one token at a time

É Intuition: If input string x L(G), there must be a . We want to pick∈ a grammar rule one token at a time so we can quickly find if the string is in the language!

É Implementation: Parse table É Intuition: 2-d table indexed by ‘current’ non-terminal and ‘lookahead’ token. Each cell contains the production rule corresponding to the lookahead.

Compiler Construction 7/41 Parse Table Construction

É Constructing the Parse table requires a PREDICT set of every non-terminal in the grammar. É We need a way to know which rule must be used to successfully construct the next part of the parse tree

S a |→ b Consider! You know exactly which rule to take based on the first thing you see as part of the S rule.

Compiler Construction 8/41 Parse Table Construction S a A a →| b A A c c A →| b b A

É Same idea! We always start with S. You can see that the first non-terminal in the S rules tells you unambiguously which rule is used! É Similarly, after you consume one ‘a’, the next non-terminal you consider is A. You can decide which A rule to choose based on the first terminal in the A rules!

Compiler Construction 9/41 Parse Table Construction

S A a A → B b B → c B →| a A

É A bit trickier, but you can still decide which rule to take with only one terminal

Compiler Construction 10/41 Parse Table Construction

É Sounds like we need to know the first terminal that can appear as a result of expanding a non-terminal!

É This is called the FIRST set FIRST(A) is the set of terminals that can appear first in the expansion of the non-terminal A

É Note this means recursing through other nonterminals!

FIRST(X) = b X ∗ bα ε X ∗ ε É { | → } ∪ { | → }

Compiler Construction 11/41 Parse Table Construction

É That includes ε rules

S a A a A → b A →| ε

Compiler Construction 12/41 Parse Table Construction

É Recall we’re trying to make a parse table É Almost enough to know FIRST set of all non-terminals. The token will tell us which rule to take next.

É However, how do we know when the token should be an ε?

Compiler Construction 13/41 Parse Table Construction

S a A B a A → b B →| ε B c A d →| ε

If my input string is aba, after consuming the ‘a’ token, how do we know that the next ‘b’ token results in B ε? →

Compiler Construction 14/41 Parse Table Construction: FOLLOW

É We also need to know the terminals that could come after the previous non-terminal

FOLLOW(X) = b S ∗ βXbω É { | → } Computing FOLLOW set É Compute FIRST sets for all non-terminals É Add $ to FOLLOW(S) É start symbol always ends with end-of-input For all productions Y ...XA1...An É → É Add FIRST(Ai)- ε to FOLLOW(X). Stop if { } ε FIRST (Ai). 6∈ É Add FOLLOW(Y) to FOLLOW(X)

Compiler Construction 15/41 Example FOLLOW Set

E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ FOLLOW(“+”) = { int, ( } FOLLOW(“(”) = { int, ( } FOLLOW(X) = { $, ) } FOLLOW(Y) = { +, ), $ }

Compiler Construction 16/41 Back to Parsing Tables

É Recall: We want to build a LL(1) Parsing Table

For each production A α in G do: → For each terminal b FIRST(α) do É ∈ É T[A][b] = α If α ∗ ε, for each b FOLLOW(A) do É → ∈ É T[A][b] = α

Compiler Construction 17/41 Parsing Table

E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ Where do we put Y T ? → ∗ É Well, FIRST(*T) = {*}, thus column * of row Y gets *T

int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε

Compiler Construction 18/41 Parsing Table

E TX → X + E | ε T → ( E ) | i n t Y Y → T | ε → ∗ Where do we put Y ε? → É Well, FOLLOW(Y) = {$, +,)}, thus columns $, +, and ) in row Y get Y ε →

int * + ( ) $ T int Y (E) E TX TX X + E ε ε Y *T ε ε ε

Compiler Construction 19/41 Another Parse Table Example

E TXR FR → → ∗ X + T X | ε |→ ε F i n t T F R→ | ( E ) → Production Prediction + * ( ) int $ E TX (, int X→ TX E + + X → ε $, ) X T → FR (, int) T R → FR * R R → ∗ε , $, ) F + F →int int → F (E) ( →

Compiler Construction 20/41 Another Parse Table Example

Production Prediction + * ( ) int $ E TX (, int X→ TX E TX TX + + X → ε $, ) X TX ε ε + T → FR (, int) T FR FR R → FR * R ε FR ε ε R → ∗ε , $, ) F ∗ E int + ( ) F →int int → F (E) ( →

Compiler Construction 21/41 Notes on LL(1) Parsing Tables

É If any entry is multiply defined then G is not LL(1)

É G is ambiguous É G is left-recursive É G is not left-factored

Compiler Construction 22/41 Ambiguity in parse tables

E E + TT F E → TF → i d T → T FF → (E) → ∗ → For the E productions, we need FIRST(T) = {(, id} and FIRST(E) = {(, id}

But now, which rule ( E E + T or E T ) gets put in → → T[E][(] and T[E][id]??

+ * ( ) id $ E ? ? T F

Compiler Construction 23/41 LR Parsing

É LL(1) parsing is simple and fast É However, what if we wanted more expressiveness in the grammar? É Enter LR Parsing, a bottom-up parsing approach É Equally efficient as LL(1) É Not quite as easy by hand É Preferred practical method (see bison/)

Compiler Construction 24/41 LR Parsing

An LR Parser reads tokens from left to right and constructs a bottom-up rightmost derivation. LR parsers shift terminals and reduce input by application of productions in reverse. LR parsing is fast and easy, and uses a finite automaton (a.k.a. a parse table) augmented with a stack. LR works fine with grammars that are left-recursive or not left-factored.

Compiler Construction 25/41 LR Parsing

É LR parsers do not require left-factored grammars and can also handle left-recursive grammars Consider

E E + (E) int → | Can you see why this is not LL(1)?

Compiler Construction 26/41 LR Parsing

É Consider input string x É Loop É Identify β in x such that A β is a production → (i.e., x = αβγ) É Replace β by A in x (i.e., x becomes αAγ) É until x = S

Compiler Construction 27/41 LR Parsing Example

Rightmost derivation:

S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →

Compiler Construction 28/41 LR Parsing Example

Rightmost derivation:

S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →

Compiler Construction 28/41 LR Parsing Example

Rightmost derivation:

S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →

Compiler Construction 28/41 LR Parsing Example

Rightmost derivation:

S a T R e S a T R e → → a T d e T T b c | b → R → d a T b c d e → → a b b c d e →

Compiler Construction 28/41 LR Parsing Example

Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing Example

Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing Example

Recall E E + (E) int → | Input = int + (int) + (int) E E + (E) E + (int) E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing Example

Recall E E + (E) int → | Input = int + (int) + (int) E E E + (E) E + (int) E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing Example

Recall E E + (E) int → | Input = int + (int) + (int) E E E + (E) E + (int) E E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing Example

Recall E E + (E) int E → | Input = int + (int) + (int) E E E + (E) E + (int) E E E E + (E) + (int) E + (int) + (int) int + (int) + (int) int + ( int ) + ( int )

Compiler Construction 29/41 LR Parsing

Important fact about LR parsing: An LR parser traces a rightmost derivation in reverse.

Compiler Construction 30/41 Notation

Idea: Split the string into two substrings

É Right substring (a string of terminals) is as yet unexamined by the parser É Left substring has terminals and non-terminals

The dividing point is marked by a • Initially, all input is new: x1x2x3...xn •

Compiler Construction 31/41 LR = shift-reduce Parsing

Bottom-up parsing uses only two actions: 1. Shift 2. Reduce

Compiler Construction 32/41 Shift

Shift: Move one place to the right • É Shifts a terminal to the left string

E + ( int ) • ⇒ E + ( int ) •

Compiler Construction 33/41 Reduce

Apply an inverse production at the right end of the left string If T E + (E) is a production, then → E + (E + (E) ) • E + ( T ) Reductions can • only happen here!

Compiler Construction 34/41 Shift-Reduce Example

int + (int) + (int)$ shift •

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int • →

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ •

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int • →

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → •

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) • →

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ •

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int • →

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → •

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → E + (E)• $ reduce E E + (E) • →

Compiler Construction 35/41 Shift-Reduce Example

int + (int) + (int)$ shift int• + (int) + (int)$ reduce E int E •+ (int) + (int)$ shift (3 times)→ E +• (int ) + (int)$ reduce E int E + (E •) + (int)$ shift → E + (E)• + (int)$ reduce E E + (E) E + (int)$• shift (3 times)→ E +• (int )$ reduce E int E + (E •)$ shift → E + (E)• $ reduce E E + (E) E $ • accept → •

Compiler Construction 35/41 Stack The left string can be implemented with a stack Top of the stack is É • Shift É pushes a terminal onto the stack Reduce É pop 0 or more symbols from the stack (production RHS) É push a non-terminal on the stack (production LHS)

Compiler Construction 36/41 When to Shift/Reduce?

É Decide based on the left string (stack) É Idea: Use a DFA to decide when to shift or reduce

É The DFA input is the stack É DFA language consists of terminals and non-terminals

É We run the DFA on the stack and we examine the resulting state X and the token t after • É If X has a transition labeled t, then shift É If X is labeled with “A β on t”, then reduce →

Compiler Construction 37/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + ( int 3 4 5 accept on $ E int on ), + E →

) int + (int) + (int)$ shift 7 6 • E E + (E) on $, + → + ( 8 9 E +

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + ( int 3 4 5 accept on $ E int on ), + → E int + (int) + (int)$ shift ) • 7 6 int + (int) + (int)$ reduce E int E E + (E) on $, + • → → + ( 8 9 E +

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + ( int 3 4 5 accept on $ E int on ), + → int + (int) + (int)$ shift E • ) int + (int) + (int)$ reduce E int 7 6 • E (int) (int)$ → E E + (E) on $, + + + shift (3 times) → + • ( 8 9 E +

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + ( int 3 4 5 accept on $ int + (int) + (int)$ shift E int on ), + → • E int + (int) + (int)$ reduce E int ) • → 7 6 E + (int) + (int)$ shift (3 times) E E + (E) on $, + • → + E + (int ) + (int)$ reduce E int ( 8 9 • → E +

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + ( int 3 4 5 int + (int) + (int)$ shift accept on $ • E int on ), + → int + (int) + (int)$ reduce E int E • → ) E + (int) + (int)$ shift (3 times) 7 6 • E (int ) (int)$ E E + (E) on $, + + + reduce E int → + • → ( E + (E ) + (int)$ shift 8 9 • E +

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E

2 + int (int) (int)$ ( int + + shift 3 4 5 • accept on $ int + (int) + (int)$ reduce E int E int on ), + → • → E E + (int) + (int)$ shift (3 times) ) • 7 6 E + (int ) + (int)$ reduce E int E E + (E) on $, + • → → + E + (E ) + (int)$ shift ( • 8 9 E + (E) + (int)$ reduce E E + (E) E + • →

10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E int + (int) + (int)$ shift 2 + ( int • 3 4 5 int + (int) + (int)$ reduce E int accept on $ • E int on ), + → → E + (int) + (int)$ shift (3 times) E • ) E + (int ) + (int)$ reduce E int 7 6 • E (E ) (int)$ → E E + (E) on $, + + + shift → + • ( E + (E) + (int)$ reduce E E + (E) 8 9 • → E + E + (int)$ shift (3 times) • 10 11 E E+(E) on ), + ) →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → E int + (int) + (int)$ shift • 2 + int (int) (int)$ ( int + + reduce E int 3 4 5 • → accept on $ E + (int) + (int)$ shift (3 times) E int on ), + → • E E + (int ) + (int)$ reduce E int ) • → 7 6 E + (E ) + (int)$ shift E E + (E) on $, + • → + E + (E) + (int)$ reduce E E + (E) ( • → 8 9 E + (int)$ shift (3 times) E + • E + (int )$ reduce E int 10 11 E E+(E) on ), + • ) → →

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + → int + (int) + (int)$ shift E • int + (int) + (int)$ reduce E int 2 + ( int • → 3 4 5 E + (int) + (int)$ shift (3 times) accept on $ • E int on ), + → E + (int ) + (int)$ reduce E int E • → ) E + (E ) + (int)$ shift 7 6 • E (E) (int)$ E E + (E) on $, + + + reduce E E + (E) → + • → ( E + (int)$ shift (3 times) 8 9 • E + E + (int )$ reduce E int • → 10 11 E E+(E) on ), + E + (E )$ shift ) → •

Compiler Construction 38/41 LR(1) Parsing Example

int 0 1 E int on $, + int + (int) + (int)$ shift → • E int + (int) + (int)$ reduce E int • → 2 + E (int) (int)$ shift (3 times) ( int + + 3 4 5 • accept on $ E + (int ) + (int)$ reduce E int E int on ), + → • → E E + (E ) + (int)$ shift ) • 7 6 E + (E) + (int)$ reduce E E + (E) E E + (E) on $, + • → → + E + (int)$ shift (3 times) ( • 8 9 E + (int )$ reduce E int E + • → E + (E )$ shift 10 11 E E+(E) on ), + • ) → E + (E) $ reduce E E + (E) • →

Compiler Construction 38/41 LR(1) Parsing Example

int (int) (int)$ int + + shift 0 1 E int on $, + → • E int + (int) + (int)$ reduce E int • → E + (int) + (int)$ shift (3 times) 2 + ( int • 3 4 5 E + (int ) + (int)$ reduce E int accept on $ • E int on ), + → → E + (E ) + (int)$ shift E • ) E + (E) + (int)$ reduce E E + (E) 7 6 • → E E + (E) on $, + E + (int)$ shift (3 times) → + • ( E + (int )$ reduce E int 8 9 • E → + E + (E )$ shift • 10 11 E E+(E) on ), + E + (E) $ reduce E E + (E) ) → • → E $ accept •

Compiler Construction 38/41 Parsing Tables

That’s right! Represent that DFA as a table

É Parsers represent the DFA as a 2D table É Rows correspond to DFA states É Columns correspond to terminals/non-terminals É Cells contain actions É Terminals shift or reduce É Non-temrinals goto subsequent DFA states

É Critically Restart DFA with stack as input every time you reduce

Compiler Construction 39/41 Parsing Tables

int + ( ) $ E

0 s1 g2 1 rE int rE int 2 s3→ Accept→ 3 s4 4 s5 g6 5 r r E int E int int 6 s8→ s7 → 0 1 E int on $, + 7 r r → E E+(E) E E+(E) E ... → → 2 + ( int 3 4 5 accept on $ E int on ), + E →

) 7 6

E E + (E) on $, + → + ( 8 9 E +

10 11 E E+(E) on ), + ) →

Compiler Construction 40/41 Parsing Tables

int + ( ) $ E

0 s1 g2 1 rE int rE int 2 s3→ Accept→ 3 s4 4 s5 g6 5 r r E int E int int 6 s8→ s7 → 0 1 E int on $, + 7 r r → E E+(E) E E+(E) E ... → → 2 + ( int 3 4 5 accept on $ E int on ), + E →

) 7 6

E E + (E) on $, + → + ( 8 9 E +

10 11 E E+(E) on ), + ) →

Compiler Construction 40/41 Next time

É Parser generators construct the parsing DFA/table for you É For PA3, you use a yacc/bison-based parser generator to implement the COOL grammar!

É What about the

Compiler Construction 41/41